cbg-ethz / PredictHaplo

This software aims at reconstructing haplotypes from next-generation sequencing data.
GNU General Public License v3.0
5 stars 0 forks source link

37452 Segmentation fault (core dumped) ./predicthaplo --sam ./example-inputs/$sam_filename --reference ./example-inputs/$reference_filename #28

Open Masterxilo opened 2 years ago

Masterxilo commented 2 years ago

Hi there

since apparently our other .bam file seems to contain only unpaired reads (#27), we now started with a .bam file produced by a different processing pipeline from our NGS data.

Now we get a bit farther, but still predicthaplo errors out in the end

Configuration:
  prefix = predicthaplo_output/ph_
  cons = ./example-inputs/base_uuid=8d72b145-6257-41e3-afcf-429345f7b1a4.reference.fasta
  visualization_level = 1
  FASTAreads = ./example-inputs/base_uuid=8d72b145-6257-41e3-afcf-429345f7b1a4.a_aln_pe_output_bwa_mapped_sorted.bam.sam
  have_true_haplotypes = 1
  FASTAhaplos = 
  do_local_Analysis = 1
Warning: 0.5% of the reads were discarded because of an unsupported attribute.
Warning: 1.1% of the reads were discarded because they are unmapped.
Warning: 45.4% of the read pairs were discarded because the sequence is too short. The flag "--min_length" can be used to configure the minimum length.
Warning: 0.6% of the read pairs were discarded because the gap fraction is too high. The flag "--max_gap_fraction" can be used to configure the maximum gap fraction.
Warning: 0.2% of the read pairs were discarded because the alignment score fraction (scaled value of TAG "AS") is too low. The flag "--min_align_score_fraction" can be used to configure the minimum alignment score fraction.
After parsing the reads in file ./example-inputs/base_uuid=8d72b145-6257-41e3-afcf-429345f7b1a4.a_aln_pe_output_bwa_mapped_sorted.bam.sam: average read length= 375.248 276950
First read considered in the analysis starts at position 7. Last read ends at position 9677
There are 276950 reads
Median of read lengths: 353.000
Local window size: 247
Minimum overlap of reads to local analysis windows: 209
./run: line 60: 39538 Segmentation fault      (core dumped) ./predicthaplo --sam ./example-inputs/$sam_filename --reference ./example-inputs/$reference_filename
ERROR  ./run:59                                :  produced nonzero exit code 139 - terminating

Does this tell you anything?

I am running this on a fairly powerful computer with 64 GB RAM, so I don't think it crashes for out of memory reasons.

DrYak commented 2 years ago

Usually, Segmentation fault error require carefully running the data in a debugger (such as gdb, valgrind): would it be possible for you to send use the alignment file so that a C/C++ dev could try it?

Also:

Warning: 45.4% of the read pairs were discarded because the sequence is too short. The flag "--min_length" can be used to configure the minimum length. This seems a bit heigh, though I'm not that experienced with predicthaplo (@kpj : do you have any feedback?)

What is your current experimental setup? How were your reads generated? (is it assymetric read pairs ? e.g. a MiSeq that has 300bp on Read1 and 200bp on Read2)

Masterxilo commented 2 years ago

Hi Ivan/@DrYak, nice to hear from you.

As the data originates from a virus mix and not a patient, that should be no problem. I sent you a download link along with a Dockerfile that reproduces exactly the way I invoke it.

As for how the reads where generated I don't know specifics. I have included links to the raw fastq files as well. I guess our read lengths are symmetric, the technique used is 2x250bp, so I guess they are all the same length. Lisa: Can you comment on that?

It is worth noting that on the same samples we got #27 when we produced the .bam file using https://github.com/medvir/SmaltAlign

This .bam file was produced with a pipeline by Sandra that apparently uses bwa.