dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
437 stars 136 forks source link

Error of Delly Segmentation fault(core dumped) #267

Closed pabloangulo7 closed 2 years ago

pabloangulo7 commented 2 years ago

Hello

I have an error using delly. When I try to run the CNV Germline, I have an error: Segmentation fault (core dumped). I don't think that it is an installation error because I use the docker version, and the bam files I use were valid for other programs but with delly I have this error.

delly_error

Thanks

tobiasrausch commented 2 years ago

This indeed looks like a potential bug. Can you share these input files? If yes, just write me an email and I can send you an upload link to our server. My email address is in the delly paper abstract. Thanks.

pabloangulo7 commented 2 years ago

Hello,

I would like to send you the input files to see if is possible to solve this problem

Regards

Pablo

El mar., 26 oct. 2021 9:12, Tobias Rausch @.***> escribió:

This indeed looks like a potential bug. Can you share these input files? If yes, just write me an email and I can send you an upload link to our server. My email address is in the delly paper abstract https://academic.oup.com/bioinformatics/article/28/18/i333/245403. Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dellytools/delly/issues/267#issuecomment-951628059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ3OB3VJXFDHK4PEO6EXDEDUIZPFNANCNFSM5GVWXIJA .

tobiasrausch commented 2 years ago

No, that's not a bug. Your mappability map flags only 0.01% of the bases as uniquely mappable. That's not going to work because these positions are used by delly to estimate the GC bias and normalize the coverage. As a rule of thumb, at least 75% of the genome should be uniquely mappable. This could be either an issue with the assembly if it's very drafty or something went wrong in building the mappability map.

pabloangulo7 commented 2 years ago

Ok thanks for helping. There would be another way to obtain the mappability map (with other program or in other formats) or to modify the current one in order to run the tool? In the code, I saw that dicey assigns three different letters for each base; N, A or C. Which is the meaning of each letter? Maybe if I can understand the process, I could modify the file manually.

tobiasrausch commented 2 years ago

So you tried already the workflow described in the FAQ? Then it's indeed a matter of the assembly quality I suppose.

pabloangulo7 commented 2 years ago

Yes, I did the alignment with bwa mem, then I sorted the bam files and finally I used the dicey tool. So I would like to ask you about the meaning of the output of the dicey tool (meaning of the letters N, A and C and how they relate to the mappability of the genome) so I can understand better why I obtained that result.

tobiasrausch commented 2 years ago

N is not uniquely mappable, A is uniquely mappable at low quality (depends on -q parameter) and C is uniquely mappable at high mapping quality.

pabloangulo7 commented 2 years ago

Thanks for helping. Also I have a few more questions about the delly cnv options. I'm not sure what is the meaning of the --window-offset option and --fraction-window of Read-depth window, the --cn-offset and --sdrd (min. SD read-depth shift) in CNV calling and the --fraction-unique in GC fragment normalization. I suppose that by default, the program uses a 10000 window size for read-depth and GC fragment normalization. Finally, in the output file, in the format field of cnv likelihoods, you only obtain ten values, so if the copy number predicted is 25, that means that the likelihoods correspond to the 15-25 copy number probabilities? This means that the rest of probabilities that not appear are very low in comparison?

Thanks in advance for your patience and sorry if there are too many questions.

trinidadmartin commented 2 years ago

Hello Tobias! I think I need a bit of help here... I have done this call for delly

./src/delly call -q 40 -g /mnt/d/Pooled_linkage_analysis/S288C_genome.fasta -o delly7B.bcf /mnt/server/trini_tonterias/Pool ed_linkage_analysis/files4_SNPanalysis/7B_bwa_mem_S288c_alignment.sorted.markedDuplicates.AddReadGroups.bam /mnt/server/trini_tonterias/Pooled_linkage_analysis/files4_SNPanalysis/7B _control_bwa_mem_S288c_alignment.sorted.markedDuplicates.AddReadGroups.bam

I have two bam files aligned with BWA, sorted and duplicates marked with picard tools. As you can see I am trying to compare them as one is control and the other is mutant lets say.

After hours of running, I got the same error:

`[2021-Dec-16 17:08:47] Paired-end and split-read scanning

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2021-Dec-16 18:24:34] Split-read clustering

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2021-Dec-16 18:24:42] Paired-end clustering

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2021-Dec-16 18:25:03] Split-read assembly

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2021-Dec-16 20:51:37] Generate REF and ALT probes

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[2021-Dec-16 20:51:37] SV annotation

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


[E::bcf_hdr_add_sample_len] Duplicated sample name '20' [2021-Dec-16 23:03:54] Genotyping

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *Segmentation fault (core dumped) trini@Wavemaster:/mnt/d/Pooled_linkage_analysis/SV/delly$`

Any ideas??

Thanks a lot!! Trini

tobiasrausch commented 2 years ago

The 2 bam files use the same sample name. That's not good and causes the crash. The BAM files should always have unique sample names and I now added a sanity check for this to the code.

trinidadmartin commented 2 years ago

Hello!!

Thanks for the super fast answer!! They do not have the same name, the first one is 7B and the second one 7B_control, so I guess that's not the problem ;)

Thanks again!! T

tobiasrausch commented 2 years ago

What matters is the sample name inside the BAM file and that appears to be "Duplicated sample name '20'"

trinidadmartin commented 2 years ago

Could you tell me how to check that?? It is weird... They always had different names before the alignment...

Thanks! T

tobiasrausch commented 2 years ago

samtools view -H input.bam | grep --color "SM:"

trinidadmartin commented 2 years ago

Thanks a lot!! I changed the SM and it worked