bioinform / metasv

MetaSV: An accurate and integrative structural-variant caller for next generation sequencing
http://bioinform.github.io/metasv/
BSD 2-Clause "Simplified" License
54 stars 21 forks source link

Deletion assembly and soft clip #117

Open Madelinehazel opened 7 years ago

Madelinehazel commented 7 years ago

Hello,

There seems to be some bug in metaSV when the --boost_scs option is included. With this option, my deletion detection sensitivity drops dramatically from ~80% to 5% (as measured using VarSim). My command is as follows, using metaSV v 0.5.3: run_metasv.py --bam $BAM \ --reference $REF \ --sample $SAMPLE --boost_sc \ --cnvnator_native $SAMPLE.bam_CNVcall.100 \ --lumpy_vcf $SAMPLE.bam_lumpy.vcf \ --spades /home/hpcuser01/SPAdes-3.6.2-Linux/bin/spades.py \ --age /home/hpcuser01/AGE/age_align \ --min_support_ins 2 \ --max_ins_intervals 500000 --isize_mean $INSMEAN --isize_sd $INSSD \ --num_threads $THREADS --outdir $SAMPLE.metaSV.out --workdir $SAMPLE.metaSV.work Even if I specify --svs_to_assembly INS I still have deletions dropping out. Not sure why this is.

marghoob commented 7 years ago

Hi @Madelinehazel , could you please try out MetaSV 0.5.4 to see if the problem still persists? We have made a couple of bugfixes since then. By the way, did you what tools did you use to call deletions earlier? Looks like there are only 2 tools here: LUMPY and CNVnator and since only overlapping calls are reported as high-confidence, the drop in accuracy could be explained if the set of tools has changed.

Madelinehazel commented 7 years ago

I will try out 0.5.4. I used Lumpy and CNVnator previously, as well, without assembly.

Madelinehazel commented 7 years ago

I'm having the same issue with the latest release. When the --boost_scs option is used to enhance insertion sensitivity detection, it interferes with DEL detection, decreasing the sensitivity from 80% to 5%, even if --svs_to_assemble specifies only INS, INV, and DUP.

marghoob commented 7 years ago

I see. Can you attach the log file (if too large, then in the compressed form)? It might also have something to do with the parameters. What was the coverage in the samples (there is a --mean_read_coverage option for specifying coverage)?

Madelinehazel commented 7 years ago

metaSV_output.txt.gz The mean coverage was 30X

Madelinehazel commented 7 years ago

@marghoob Have you identified the issue?

marghoob commented 7 years ago

@Madelinehazel I had a look at the log file and it is not clear from it why we are missing deletions. How many deletions did you expect to be high-confidence (rough number should be good). In addition, it would be good to also attach compressed VCF from lumpy and calls from CNVnator. When you measured accuracy did you include or skip low-confidence calls. As an intermediate file, could you also attach /pre_asm.vcf since that will inform whether calls were lost during merge itself. Apologies for the delay due to recent travels.

Madelinehazel commented 7 years ago

I expect about 800 high-confidence deletions and a varSim sensitivity of 80%. This is the output when the --boost_scs option is not used. With --boost_scs, I get 99 high confidence deletions, and a varSim sensitivity of 5%. Here are the files:

varSim_hs37d5.bam_CNVcall.100.gz varSim_hs37d5.bam_CNVcall.100.vcf.gz varSim_hs37d5.bam_lumpy.vcf.gz

pre_asm.vcf.gz

Thanks again for your help.

marghoob commented 7 years ago

Thanks Madeline. The pre-assembly VCF looks fine and the number of PASS DEL calls matches what you expect (you should also try to check the accuracy of this VCF using VarSim). Could you also attach the final VCF from MetaSV so that we know whether the calls are being missed.

Best Marghoob.

On Fri, Feb 3, 2017 at 10:50 AM, Madeline Couse notifications@github.com wrote:

I expect about 800 high-confidence deletions and a varSim sensitivity of 80%. This is the output when the --boost_scs option is not used. With --boost_scs, I get 99 high confidence deletions, and a varSim sensitivity of 5%. Here are the files:

varSim_hs37d5.bam_CNVcall.100.gz https://github.com/bioinform/metasv/files/751279/varSim_hs37d5.bam_CNVcall.100.gz varSim_hs37d5.bam_CNVcall.100.vcf.gz https://github.com/bioinform/metasv/files/751280/varSim_hs37d5.bam_CNVcall.100.vcf.gz varSim_hs37d5.bam_lumpy.vcf.gz https://github.com/bioinform/metasv/files/751284/varSim_hs37d5.bam_lumpy.vcf.gz

pre_asm.vcf.gz https://github.com/bioinform/metasv/files/751293/pre_asm.vcf.gz

Thanks again for your help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bioinform/metasv/issues/117#issuecomment-277329733, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2CazsCTtbpSHtCYkzF-hSnb8Rhxpd4ks5rY3cQgaJpZM4Ll_dQ .

Madelinehazel commented 7 years ago

Here's the vcf: metaSV_boostsc.vcf.gz

marghoob commented 7 years ago

Hi Madeline, it looks like we need to look at multiple intermediate files. So, it would be good if you can tar-gzip all the BED files in the work directory and attach the archive.

Best Marghoob.

On Fri, Feb 3, 2017 at 12:20 PM, Madeline Couse notifications@github.com wrote:

Here's the vcf: metaSV_boostsc.vcf.gz https://github.com/bioinform/metasv/files/751475/metaSV_boostsc.vcf.gz

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bioinform/metasv/issues/117#issuecomment-277351967, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2Ca04CwMqSre9mfnqbMcAZQnhjVWZEks5rY4v5gaJpZM4Ll_dQ .

Madelinehazel commented 7 years ago

All the bed files including those in the numbered subfolders within the work directory?

Madelinehazel commented 7 years ago

Hi Marghoob, here's a tar.gz file with the bed files contained in the work directory. varSim_metaSV.0.5.4.INS.DUP.INV.AS.work.bed.tar.gz

marghoob commented 7 years ago

Hi @Madelinehazel Could also attach workdir/genotyping/genotyped.bed ? We had a look and it seems that calls are getting lost at the final step when the final VCF generation is done. We try to resolve calls which overlap calls of other types and this might be causing issues so we need to see what's happening at that step.

Madelinehazel commented 7 years ago

OK. Uploaded as txt instead of bed, as bed isn't supported. genotyped.txt