Closed fpbarthel closed 6 years ago
If you can add SVABA or GROM, that would work.
Lumpy is failing on canine and doubt if Emmanuel have tried Delly. SVABA runs good (6 hrs or so) on canine data but outputs mostly BND calls.
I have recently tried GROM based on Hashpreet’s suggestion, and it seems to at least output much less BND calls than SVABA.
Stupid me as I thought BND calls are calls-undetermined. Should be relatively easy to convert BND calls to conventional calls like DEL, DUP, INV. svaba/issues/4 and http://simpsonlab.github.io/2015/06/15/merging-sv-calls/
https://academic.oup.com/gigascience/article/6/10/1/4160384
Also, need to check if any of callers we use for SV are compatible with SV-merge tools like SURVIVOR or FUSORSV.
Samir
I think we are going to try and run our ensemble of 8 callers on these tumor normal BAMs. Tricky part is the readgroups, but if we can get VCFs out of the callers we can definitely integrate using fusorSV.
Ankit
Here is a tumor/normal pair with splitters and discordant reads extracted. Let me know if this works out. @lslochov
Tumor
/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.splitters.sorted.bam
Control
/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.splitters.sorted.bam
Note that these have multiple readgroups:
barthf@helix180$ samtools view -H /fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam | grep @RG
@RG ID:C25HA.3 PL:illumina PU:C25HAACXX130609.3.TTCGCTGA LB:Solexa-161833 PI:0 DT:2013-06-09T00:00:00-0400 SM:TCGA-DU-6407-01A-13D-1705-08 CN:BI
@RG ID:D26WH.5 PL:illumina PU:D26WHACXX130609.5.TTCGCTGA LB:Solexa-161840 PI:0 DT:2013-06-09T00:00:00-0400 SM:TCGA-DU-6407-01A-13D-1705-08 CN:BI
@RG ID:D26WH.6 PL:illumina PU:D26WHACXX130609.6.TTCGCTGA LB:Solexa-161840 PI:0 DT:2013-06-09T00:00:00-0400 SM:TCGA-DU-6407-01A-13D-1705-08 CN:BI
barthf@helix180$ samtools view -H /fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam | grep @RG
@RG ID:H0NDT.1 PL:illumina PU:H0NDTADXX130601.1.CCTTCGCA LB:Solexa-162258 PI:0 DT:2013-06-01T00:00:00-0400 SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
@RG ID:H0NDT.2 PL:illumina PU:H0NDTADXX130601.2.CCTTCGCA LB:Solexa-162258 PI:0 DT:2013-06-01T00:00:00-0400 SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
@RG ID:H0PJD.1 PL:illumina PU:H0PJDADXX130611.1.CCTTCGCA LB:Solexa-162266 PI:0 DT:2013-06-11T00:00:00-0400 SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
@RG ID:H0PJD.2 PL:illumina PU:H0PJDADXX130611.2.CCTTCGCA LB:Solexa-162266 PI:0 DT:2013-06-11T00:00:00-0400 SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
Hi, I noticed that the splitter and discordant BAMs have disappeared from /fastscratch/verhaak-lab/GLASS-WG/results/lumpy today. Did they get moved somewhere?
@lslochov I don't know what happened here, I do not see them either. I wonder if they were removed by the fastscratch
scrubber. Very sorry about this. I am regenerating them now, but this will take a few hours. I will update you when done.
@lslochov the two files I mentioned above are back at the same location as before and other files are being generated. Sorry again about this (and that it had to happen twice), not sure what happened to them.
Hi @lslochov @MalhotraAnkit, I just returned from a trip abroad and curious where you guys are on this. I am available for a brief meeting any time this week, if you'd like.
Hi Floris, we've successfully run the T/N sample you gave us on Delly and Manta. Unfortunately, Lumpy had a segmentation fault during execution that we have yet to resolve. I've seen the Lumpy authors deal with specific BAMs that have this problem on the Lumpy forums. However, this usually involves the authors viewing the problematic BAMs, and I don't know if that's OK with these BAMs.
Regardless, the Delly output gave us 143 somatic variants, and Manta gave us 662 somatic variants. You can find these files on Helix at:
Delly: /home/lochol/projects/data/delly-cancer/TCGA-14-1402.somatic.vcf Manta: /home/lochol/projects/data/manta-cancer/results/variants/somaticSV.vcf
What do you think of these results?
Thanks Lucas,
Maybe we can setup a meeting to discuss this more,
Ankit
The results look good. My experiences with LUMPY are similar, and that it's not exactly user friendly. Maybe try one other T/N pair and see if it segfaults, too? I've included one below. Another option may be to not use LUMPY and resort to other callers, but its not my preferred option (but may be necessary).
Yes, also would be nice to meet face to face @Kcjohnson @MalhotraAnkit @lslochov. Are you available any time this week or next?
Tumor
/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.splitters.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.discordant.sorted.bam
Control
/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.splitters.sorted.bam
NB. These files may be removed by scrubber soon, so let me know if you can access them
Using bams produced by a similar, yet distinct workflow, a former labmate ran lumpy (via speedseq) on this TCGA sample. For this tumor, lumpy called 104 pre-filtered somatic variants.
I haven't had an opportunity to look at vcd files, but we can talk more about this on Wednesday.
Hi, that’s actually the same T/N pair I’ve already tried in Lumpy.
--
Lucas Lochovsky Postdoctoral Associate The Jackson Laboratory for Genomic Medicine Ten Discovery Dr Farmington, CT 06032 860-837-2155 lucas.lochovsky@jax.org www.jax.org
The Jackson Laboratory: Leading the search for tomorrow's cures
From: fpbarthel notifications@github.com Reply-To: TheJacksonLaboratory/GLASS-WG reply@reply.github.com Date: Monday, July 16, 2018 at 3:17 PM To: TheJacksonLaboratory/GLASS-WG GLASS-WG@noreply.github.com Cc: Lucas Lochovsky Lucas.Lochovsky@jax.org, Mention mention@noreply.github.com Subject: Re: [TheJacksonLaboratory/GLASS-WG] To-Do: SV pipeline (#7)
My experiences with LUMPY are similar, and that it's not exactly user friendly. Maybe try one other T/N pair and see if it segfaults, too? I've included one below.
Yes, also would be nice to meet face to face @Kcjohnsonhttps://github.com/Kcjohnson @MalhotraAnkithttps://github.com/MalhotraAnkit @lslochovhttps://github.com/lslochov. Are you available any time this week or next?
Tumor
/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.splitters.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.discordant.sorted.bam
Control
/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.splitters.sorted.bam
NB. These files may be removed by scrubber soon, so let me know if you can access them
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
Was the issue with multiple readgroups ? It’s more complicated and not default behavior to process a single BAM with multiple read groups with Lumpy,
Ankit
@sbamin also pointed out that he has had segmentation errors in the past with LUMPY when it was writing to /tmp
(which would reach capacity) rather than a user specified temporary directory (eg. /fastscratch/xxx
)
Hi, I put the scripts I use for SV calling with the SVE callers under the "helix_sv_scripts" folder.
Also added the code I use for calculating the read group insert sizes.
@lslochov do you not need to supply metrics to lumpy as parameter?
eg.
lumpy \
-mw 4 \
-tt 0 \
-pe id:sample1,bam_file:sample1.discordants.bam,read_group:rg1,read_group:rg2,histo_file:sample1.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 \
-pe id:sample1,bam_file:sample1.discordants.bam,read_group:rg3,histo_file:sample1.lib2.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 \
-pe id:sample2,bam_file:sample2.discordants.bam,read_group:rg4,histo_file:sample2.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 \
-sr id:sample1,bam_file:sample1.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 \
-sr id:sample2,bam_file:sample2.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 \
> multi_sample.vcf
Or does lumpy-express take care of those calculations? Why do you use lumpy express over regular lumpy? If lumpy express calculates insert sizes and other parameters already, is there a need to run the insert size calculator tool?
@lslochov and @MalhotraAnkit I tried to look for you both on the JAX slack but I could not find you there. It would maybe be useful to make a slack channel for this discussion. Is there any way you guys can get access to the slack?
We have slack, but I don’t think we are members of the JAX slack channel – someone will have to add us to that,
We use lumpy express because that’s what we had used for the SVE/fusorSV paper to keep default parameters suggested by the authors.
Ankit
It looks like I need to reinstall the software or use pre-installed versions on helix since your installations don't usually work for me (manta, being python, the only exception). Can you let me know if the versions mentioned here for helix installs are OK? Also, let me know the path to Svelter since it's not listed in the file. Lastly, @lslochov you should have received a Slack invite from Dave Mellert. I've already created a group with Ankit.
Delly -- doesnt work -- helix has 0.7.2 installed
barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/SVE/src/delly/src/delly
/projects/lochol/code/SVE/src/delly/src/delly: error while loading shared libraries: libboost_iostreams.so.1.57.0: cannot open shared object file: No such file or directory
Lumpy -- doesn't work - helix has 0.2.12 installed
barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/tools/lumpy-sv/bin/lumpyexpress
Sourcing executables from /projects/lochol/tools/lumpy-sv/bin/lumpyexpress.config ...
Checking for required python modules (/home/lochol/tools/mypy/bin/python)...
/projects/lochol/tools/lumpy-sv/bin/lumpyexpress: line 95: /home/lochol/tools/mypy/bin/python: Permission denied
Manta -- works
barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/manta-build/bin/configManta.py
Usage: configManta.py [options]
Version: 1.3.2
Svelter -- no executable directory specified in script you supplied here. There is no existing helix installation.
CNVnator -- SVE doesn't work for me and I would prefer to run standalone application. CNVnator 0.3 is already installed on helix.
Hi Floris, I thought I had set the permissions on the executables I provided so that anyone could use them. My apologies—I think I’ve rectified it now. Anyway, I’ve updated the SVelter script to include the path to the executable, and provided a list of the versions of the SV calling software we use in SVE:
CNVnator: v0.3.3 Delly: v2 GenomeSTRiP: v2 Lumpy: v0.2.13 SVelter: v0.1
Let me know if there’s anything else I can help with.
--
Lucas Lochovsky Postdoctoral Associate The Jackson Laboratory for Genomic Medicine Ten Discovery Dr Farmington, CT 06032 860-837-2155 lucas.lochovsky@jax.org www.jax.org
The Jackson Laboratory: Leading the search for tomorrow's cures
From: fpbarthel notifications@github.com Reply-To: TheJacksonLaboratory/GLASS-WG reply@reply.github.com Date: Friday, July 20, 2018 at 9:42 AM To: TheJacksonLaboratory/GLASS-WG GLASS-WG@noreply.github.com Cc: Lucas Lochovsky Lucas.Lochovsky@jax.org, Mention mention@noreply.github.com Subject: Re: [TheJacksonLaboratory/GLASS-WG] To-Do: SV pipeline (#7)
It looks like I need to reinstall the software or use pre-installed versions on helix since your installations don't usually work for me (manta, being python, the only exception). Can you let me know if the versions mentioned here for helix installs are OK? Also, let me know the path to Svelter since it's not listed in the file. Lastly, @lslochovhttps://github.com/lslochov you should have received a Slack invite from Dave Mellert. I've already created a group with Ankit.
Delly -- doesnt work -- helix has 0.7.2 installed
barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/SVE/src/delly/src/delly
/projects/lochol/code/SVE/src/delly/src/delly: error while loading shared libraries: libboost_iostreams.so.1.57.0: cannot open shared object file: No such file or directory
Lumpy -- doesn't work - helix has 0.2.12 installed
barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/tools/lumpy-sv/bin/lumpyexpress
Sourcing executables from /projects/lochol/tools/lumpy-sv/bin/lumpyexpress.config ...
Checking for required python modules (/home/lochol/tools/mypy/bin/python)...
/projects/lochol/tools/lumpy-sv/bin/lumpyexpress: line 95: /home/lochol/tools/mypy/bin/python: Permission denied
Manta -- works
barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/manta-build/bin/configManta.py
Usage: configManta.py [options]
Version: 1.3.2
Svelter -- no executable directory specified in script you supplied herehttps://github.com/TheJacksonLaboratory/GLASS-WG/blob/master/helix_sv_scripts/svelter-helix.sh. There is no existing helix installation.
CNVnator -- SVE doesn't work for me and I would prefer to run standalone application. CNVnator 0.3 is already installed on helix.
The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.
@lslochov can you double check delly release? there is no delly v2 that I can find.
@fpbarthel You can find Delly 2 here: https://github.com/dellytools/delly
@lslochov what version though?
Isn’t it available through SVE ?
Ankit
SVE doesn't work for me
@fpbarthel I think you can just pull the latest version. Also, I adjusted the permissions on the paths to the executables I provided. Can you check if you can access my SVE install now? Otherwise, you can just pull directly from the SVE Github repo.
OK lumpyexpress is running on all samples, working fine. However, the output VCF file do not seem to be compatible with bcftools, tabix and GATK! Any suggestions?
When I try to index the output VCF file, I get the following error:
barthf@helix180$ bcftools index -t tmp.vcf.gz
[E::hts_idx_push] Unsorted positions on sequence #1: 31129362 followed by 17684780
index: failed to create index for "tmp.vcf.gz"
The error suggested that perhaps I should sort the VCF file, but when I try that I get the following error:
barthf@helix180$ bcftools sort tmp.vcf.gz
Writing to /tmp/bcftools-sort.V7Ihdb
[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig '2' is not defined in the header. (Quick workaround: index the file with tabix.)
...
[E::bcf_write] Unchecked error (1), exiting
If I try to index using tabix, I get the following error:
barthf@helix180$ tabix tmp.vcf.gz
[E::hts_idx_push] Unsorted positions on sequence #1: 31129362 followed by 17684780
tbx_index_build failed: tmp.vcf.gz
I then figured, perhaps I need to add contigs back into the VCF, but that gives the following error:
barthf@helix180$ gatk UpdateVCFSequenceDictionary -V tmp.vcf.gz --source-dictionary /projects/verhaak-lab/verhaak_ref/gatk-legacy-bundles/b37/human_g1k_v37_decoy.dict -O tmp.dict.vcf.gz
A USER ERROR has occurred: An index is required but was not found for file drivingVariantFile:/gpfs/ctgs0/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/call/tmp.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.
You can guess by now, that using GATK to try and index the file also did not work:
barthf@helix180$ gatk IndexFeatureFile -F tmp.vcf.gz
A USER ERROR has occurred: Error while trying to create index for /gpfs/ctgs0/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/call/tmp.vcf.gz. Error was: htsjdk.tribble.TribbleException.MalformedFeatureFile: Input file is not sorted by start position.
We saw a record with a start of 1:17684780 after a record with a start of 1:31129362, for input source: /gpfs/ctgs0/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/call/tmp.vcf.gz
Ignore above message, I was able to resolve this by using UpdateVCFSequenceDictionary
on the uncompressed VCF
Hi @lslochov , some questions
@fpbarthel From the looks of your error text in running CNVnator 0.3.3, it looks like you just need to install the subprocess32 module. You can run pip install subprocess32
and it should take care of things.
Alternatively, you can try the CNVnator executable directly here:
/home/lochol/code/SVE/src/CNVnator_v0.3.3/src/cnvnator
I'll get back to you on running Lumpy with CNVnator input.
@lslochov writes:
Oh yeah, I found there’s a script in the Lumpy distribution for taking CNVnator output and turning it into a BEDPE file, which you can give to Lumpy with the
-bedpe
option.The script is: https://github.com/arq5x/lumpy-sv/blob/master/scripts/cnvanator_to_bedpes.py GitHub arq5x/lumpy-sv lumpy-sv - lumpy: a general probabilistic framework for structural variant discovery
Looks like that option isn’t available on the express version. I guess you’ll have to use the traditional Lumpy.
Pipelines are running as they should. Delly is still slow but that is addressed in another issue. Closing.
Currently splitting BAM files into splitters and discordant files to provide @lslochov with a tumor/control pair for testing.
Pipeline components