GeneDx / pgr-tk

PGR-TK: Pangenome Research Tool Kit
MIT License
90 stars 12 forks source link

pgr-mdb - Segmentation Fault #22

Closed mozack closed 1 year ago

mozack commented 1 year ago

Hi,

I'm getting a segmentation fault when running pgr-mdb. I'm using v0.4.1 downloaded from the releases page running within a docker image generated from: https://github.com/GeneDx/pgr-tk/blob/main/docker/Dockerfile.build_env-22.04

The following message is output prior to the core dump:

"Reading AGC file using the AGC library writting in C can cause segementation fault if wrong file type or corrupted AGC file is provided. If you see segenmentation fault, please make sure you have proper AGC files specifed as the input file.Segmentation fault (core dumped)"

I've also tried running pgr-mdb 0.3.6 via the conda installation and receive a seg fault without the above message.

The input AGC file was generated with AGC v3.0 against the human pangenome draft assemblies - very similar to https://github.com/GeneDx/pgr-tk-notebooks/blob/main/00-1-create_pgr_index.ipynb although I used hg38 as the reference and did not include hg19. I'm able to use AGC to query the generated AGC file without issue and am not aware of any corruption.

Is there a specific version of AGC required to be used with pgr-tk? Any suggestions on how to troubleshoot?

Command line: docker run -v ${PWD}:${PWD} -i pgr-tk:0.1 sh -c "cd /mnt/efs/users/lmose/pangenome/pgr_tk/; target/release/pgr-mdb pgr_filelist.txt pangenome_draft1"

Filelist content:

cat pgr_filelist.txt
pangenome_draft1.agc

File listing:

ls -lh pangenome_draft1*
-rw-rw-r-- 1 ec2-user ec2-user 1.4G Jul  7 23:26 pangenome_draft1.agc
mozack commented 1 year ago

It runs to completion when the agc file is created using an older version of AGC.