DecodeGenetics / graphtyper

Population-scale genotyping using pangenome graphs
http://dx.doi.org/10.1038/ng.3964
MIT License
167 stars 20 forks source link

Segmentation fault Reads with name=xxx both have IS_FIRST_IN_PAIR=1 #101

Open jjfarrell opened 2 years ago

jjfarrell commented 2 years ago

Graphtyper is generating the error below which indicates there is some issue with a read pair in a cram. After the error message, a segmentation fault is generated. This is running on 4000 TOPMed crams but the error does not specify the specific cram it is in.

Any suggestions on how to deal with this? Could the problematic read pair be skipped somehow and generate a warning instead?

[``` 2022-03-15 10:49:38.027] hts_parallel_reader.cpp:309 Reads with name=H7HMVCCXX150710:6:1109:25654:19065 both have IS_FIRST_IN_PAIR=1 /var/spool/sge/scc-wj4/job_scripts/3348494: line 49: 27173 Segmentation fault bin/graphtyper genotype_sv /restricted/projectnb/casa/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa $PRIOR --sams=$CRAM_LIST --region=$REGION - O $WORKAREA --threads $NSLOTS --max_files_open=$NFILES --avg_cov_by_readlen $COV --verbose

jjfarrell commented 2 years ago

To trouble shoot this, I ran gatk ValidateSamFile on the 4000 files and detected the 4 reads in 4 crams that had the problematic reads. It would be nice if the error message included the the cram file name to avoid this step. Or even better if possible-just provide a warning and skip these reads rather than generating the error and stopping the run.

pthami07 commented 12 months ago

Hi JJ Farrel, Please let me know how much computing resources (cores, RAM and time) you used to run GraphTyper on the 4,000 cram.