fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

To-Do: SV pipeline #7

Closed fpbarthel closed 6 years ago

fpbarthel commented 6 years ago

Currently splitting BAM files into splitters and discordant files to provide @lslochov with a tumor/control pair for testing.

Pipeline components

sbamin commented 6 years ago

If you can add SVABA or GROM, that would work.

Lumpy is failing on canine and doubt if Emmanuel have tried Delly. SVABA runs good (6 hrs or so) on canine data but outputs mostly BND calls.

I have recently tried GROM based on Hashpreet’s suggestion, and it seems to at least output much less BND calls than SVABA.

Stupid me as I thought BND calls are calls-undetermined. Should be relatively easy to convert BND calls to conventional calls like DEL, DUP, INV. svaba/issues/4 and http://simpsonlab.github.io/2015/06/15/merging-sv-calls/

https://academic.oup.com/gigascience/article/6/10/1/4160384

Also, need to check if any of callers we use for SV are compatible with SV-merge tools like SURVIVOR or FUSORSV.

Samir

MalhotraAnkit commented 6 years ago

I think we are going to try and run our ensemble of 8 callers on these tumor normal BAMs. Tricky part is the readgroups, but if we can get VCFs out of the callers we can definitely integrate using fusorSV.

Ankit

fpbarthel commented 6 years ago

Here is a tumor/normal pair with splitters and discordant reads extracted. Let me know if this works out. @lslochov

Tumor

/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.splitters.sorted.bam

Control

/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.splitters.sorted.bam
fpbarthel commented 6 years ago

Note that these have multiple readgroups:

barthf@helix180$ samtools view -H /fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam | grep @RG
@RG     ID:C25HA.3      PL:illumina     PU:C25HAACXX130609.3.TTCGCTGA   LB:Solexa-161833        PI:0    DT:2013-06-09T00:00:00-0400     SM:TCGA-DU-6407-01A-13D-1705-08      CN:BI
@RG     ID:D26WH.5      PL:illumina     PU:D26WHACXX130609.5.TTCGCTGA   LB:Solexa-161840        PI:0    DT:2013-06-09T00:00:00-0400     SM:TCGA-DU-6407-01A-13D-1705-08      CN:BI
@RG     ID:D26WH.6      PL:illumina     PU:D26WHACXX130609.6.TTCGCTGA   LB:Solexa-161840        PI:0    DT:2013-06-09T00:00:00-0400     SM:TCGA-DU-6407-01A-13D-1705-08      CN:BI
barthf@helix180$ samtools view -H /fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam | grep @RG
@RG     ID:H0NDT.1      PL:illumina     PU:H0NDTADXX130601.1.CCTTCGCA   LB:Solexa-162258        PI:0    DT:2013-06-01T00:00:00-0400     SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
@RG     ID:H0NDT.2      PL:illumina     PU:H0NDTADXX130601.2.CCTTCGCA   LB:Solexa-162258        PI:0    DT:2013-06-01T00:00:00-0400     SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
@RG     ID:H0PJD.1      PL:illumina     PU:H0PJDADXX130611.1.CCTTCGCA   LB:Solexa-162266        PI:0    DT:2013-06-11T00:00:00-0400     SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
@RG     ID:H0PJD.2      PL:illumina     PU:H0PJDADXX130611.2.CCTTCGCA   LB:Solexa-162266        PI:0    DT:2013-06-11T00:00:00-0400     SM:TCGA-DU-6407-10A-01D-1705-08 CN:BI
lslochov commented 6 years ago

Hi, I noticed that the splitter and discordant BAMs have disappeared from /fastscratch/verhaak-lab/GLASS-WG/results/lumpy today. Did they get moved somewhere?

fpbarthel commented 6 years ago

@lslochov I don't know what happened here, I do not see them either. I wonder if they were removed by the fastscratch scrubber. Very sorry about this. I am regenerating them now, but this will take a few hours. I will update you when done.

fpbarthel commented 6 years ago

@lslochov the two files I mentioned above are back at the same location as before and other files are being generated. Sorry again about this (and that it had to happen twice), not sure what happened to them.

fpbarthel commented 6 years ago

Hi @lslochov @MalhotraAnkit, I just returned from a trip abroad and curious where you guys are on this. I am available for a brief meeting any time this week, if you'd like.

lslochov commented 6 years ago

Hi Floris, we've successfully run the T/N sample you gave us on Delly and Manta. Unfortunately, Lumpy had a segmentation fault during execution that we have yet to resolve. I've seen the Lumpy authors deal with specific BAMs that have this problem on the Lumpy forums. However, this usually involves the authors viewing the problematic BAMs, and I don't know if that's OK with these BAMs.

Regardless, the Delly output gave us 143 somatic variants, and Manta gave us 662 somatic variants. You can find these files on Helix at:

Delly: /home/lochol/projects/data/delly-cancer/TCGA-14-1402.somatic.vcf Manta: /home/lochol/projects/data/manta-cancer/results/variants/somaticSV.vcf

What do you think of these results?

MalhotraAnkit commented 6 years ago

Thanks Lucas,

Maybe we can setup a meeting to discuss this more,

Ankit

fpbarthel commented 6 years ago

The results look good. My experiences with LUMPY are similar, and that it's not exactly user friendly. Maybe try one other T/N pair and see if it segfaults, too? I've included one below. Another option may be to not use LUMPY and resort to other callers, but its not my preferred option (but may be necessary).

Yes, also would be nice to meet face to face @Kcjohnson @MalhotraAnkit @lslochov. Are you available any time this week or next?

Tumor

/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.splitters.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.discordant.sorted.bam

Control

/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.discordant.sorted.bam
/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.splitters.sorted.bam

NB. These files may be removed by scrubber soon, so let me know if you can access them

Kcjohnson commented 6 years ago

Using bams produced by a similar, yet distinct workflow, a former labmate ran lumpy (via speedseq) on this TCGA sample. For this tumor, lumpy called 104 pre-filtered somatic variants.

I haven't had an opportunity to look at vcd files, but we can talk more about this on Wednesday.

lslochov commented 6 years ago

Hi, that’s actually the same T/N pair I’ve already tried in Lumpy.

--

Lucas Lochovsky Postdoctoral Associate The Jackson Laboratory for Genomic Medicine Ten Discovery Dr Farmington, CT 06032 860-837-2155 lucas.lochovsky@jax.org www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures

From: fpbarthel notifications@github.com Reply-To: TheJacksonLaboratory/GLASS-WG reply@reply.github.com Date: Monday, July 16, 2018 at 3:17 PM To: TheJacksonLaboratory/GLASS-WG GLASS-WG@noreply.github.com Cc: Lucas Lochovsky Lucas.Lochovsky@jax.org, Mention mention@noreply.github.com Subject: Re: [TheJacksonLaboratory/GLASS-WG] To-Do: SV pipeline (#7)

My experiences with LUMPY are similar, and that it's not exactly user friendly. Maybe try one other T/N pair and see if it segfaults, too? I've included one below.

Yes, also would be nice to meet face to face @Kcjohnsonhttps://github.com/Kcjohnson @MalhotraAnkithttps://github.com/MalhotraAnkit @lslochovhttps://github.com/lslochov. Are you available any time this week or next?

Tumor

/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.bam

/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.splitters.sorted.bam

/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-TP-XX8PCJ.realn.mdup.bqsr.discordant.sorted.bam

Control

/fastscratch/verhaak-lab/GLASS-WG/results/bqsr/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.bam

/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.discordant.sorted.bam

/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/TCGA-DU-6407-NB-XCIM8I.realn.mdup.bqsr.splitters.sorted.bam

NB. These files may be removed by scrubber soon, so let me know if you can access them

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/GLASS-WG/issues/7#issuecomment-405351329, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AF14vBnPQNUvI_nH6SY49S1JNuKhKrxCks5uHOargaJpZM4U4DwL.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

MalhotraAnkit commented 6 years ago

Was the issue with multiple readgroups ? It’s more complicated and not default behavior to process a single BAM with multiple read groups with Lumpy,

Ankit

fpbarthel commented 6 years ago

@sbamin also pointed out that he has had segmentation errors in the past with LUMPY when it was writing to /tmp (which would reach capacity) rather than a user specified temporary directory (eg. /fastscratch/xxx)

lslochov commented 6 years ago

Hi, I put the scripts I use for SV calling with the SVE callers under the "helix_sv_scripts" folder.

lslochov commented 6 years ago

Also added the code I use for calculating the read group insert sizes.

fpbarthel commented 6 years ago

@lslochov do you not need to supply metrics to lumpy as parameter?

eg.

lumpy \
    -mw 4 \
    -tt 0 \
    -pe id:sample1,bam_file:sample1.discordants.bam,read_group:rg1,read_group:rg2,histo_file:sample1.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 \
    -pe id:sample1,bam_file:sample1.discordants.bam,read_group:rg3,histo_file:sample1.lib2.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 \
    -pe id:sample2,bam_file:sample2.discordants.bam,read_group:rg4,histo_file:sample2.lib1.histo,mean:500,stdev:50,read_length:101,min_non_overlap:101,discordant_z:5,back_distance:10,weight:1,min_mapping_threshold:20 \
    -sr id:sample1,bam_file:sample1.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 \
    -sr id:sample2,bam_file:sample2.splitters.bam,back_distance:10,weight:1,min_mapping_threshold:20 \
    > multi_sample.vcf

Or does lumpy-express take care of those calculations? Why do you use lumpy express over regular lumpy? If lumpy express calculates insert sizes and other parameters already, is there a need to run the insert size calculator tool?

@lslochov and @MalhotraAnkit I tried to look for you both on the JAX slack but I could not find you there. It would maybe be useful to make a slack channel for this discussion. Is there any way you guys can get access to the slack?

MalhotraAnkit commented 6 years ago

We have slack, but I don’t think we are members of the JAX slack channel – someone will have to add us to that,

We use lumpy express because that’s what we had used for the SVE/fusorSV paper to keep default parameters suggested by the authors.

Ankit

fpbarthel commented 6 years ago

It looks like I need to reinstall the software or use pre-installed versions on helix since your installations don't usually work for me (manta, being python, the only exception). Can you let me know if the versions mentioned here for helix installs are OK? Also, let me know the path to Svelter since it's not listed in the file. Lastly, @lslochov you should have received a Slack invite from Dave Mellert. I've already created a group with Ankit.

Delly -- doesnt work -- helix has 0.7.2 installed

barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/SVE/src/delly/src/delly
/projects/lochol/code/SVE/src/delly/src/delly: error while loading shared libraries: libboost_iostreams.so.1.57.0: cannot open shared object file: No such file or directory

Lumpy -- doesn't work - helix has 0.2.12 installed

barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/tools/lumpy-sv/bin/lumpyexpress 
Sourcing executables from /projects/lochol/tools/lumpy-sv/bin/lumpyexpress.config ...

Checking for required python modules (/home/lochol/tools/mypy/bin/python)...
/projects/lochol/tools/lumpy-sv/bin/lumpyexpress: line 95: /home/lochol/tools/mypy/bin/python: Permission denied

Manta -- works

barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/manta-build/bin/configManta.py
Usage: configManta.py [options]

Version: 1.3.2

Svelter -- no executable directory specified in script you supplied here. There is no existing helix installation.

CNVnator -- SVE doesn't work for me and I would prefer to run standalone application. CNVnator 0.3 is already installed on helix.

lslochov commented 6 years ago

Hi Floris, I thought I had set the permissions on the executables I provided so that anyone could use them. My apologies—I think I’ve rectified it now. Anyway, I’ve updated the SVelter script to include the path to the executable, and provided a list of the versions of the SV calling software we use in SVE:

CNVnator: v0.3.3 Delly: v2 GenomeSTRiP: v2 Lumpy: v0.2.13 SVelter: v0.1

Let me know if there’s anything else I can help with.

--

Lucas Lochovsky Postdoctoral Associate The Jackson Laboratory for Genomic Medicine Ten Discovery Dr Farmington, CT 06032 860-837-2155 lucas.lochovsky@jax.org www.jax.org

The Jackson Laboratory: Leading the search for tomorrow's cures

From: fpbarthel notifications@github.com Reply-To: TheJacksonLaboratory/GLASS-WG reply@reply.github.com Date: Friday, July 20, 2018 at 9:42 AM To: TheJacksonLaboratory/GLASS-WG GLASS-WG@noreply.github.com Cc: Lucas Lochovsky Lucas.Lochovsky@jax.org, Mention mention@noreply.github.com Subject: Re: [TheJacksonLaboratory/GLASS-WG] To-Do: SV pipeline (#7)

It looks like I need to reinstall the software or use pre-installed versions on helix since your installations don't usually work for me (manta, being python, the only exception). Can you let me know if the versions mentioned here for helix installs are OK? Also, let me know the path to Svelter since it's not listed in the file. Lastly, @lslochovhttps://github.com/lslochov you should have received a Slack invite from Dave Mellert. I've already created a group with Ankit.

Delly -- doesnt work -- helix has 0.7.2 installed

barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/SVE/src/delly/src/delly

/projects/lochol/code/SVE/src/delly/src/delly: error while loading shared libraries: libboost_iostreams.so.1.57.0: cannot open shared object file: No such file or directory

Lumpy -- doesn't work - helix has 0.2.12 installed

barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/tools/lumpy-sv/bin/lumpyexpress

Sourcing executables from /projects/lochol/tools/lumpy-sv/bin/lumpyexpress.config ...

Checking for required python modules (/home/lochol/tools/mypy/bin/python)...

/projects/lochol/tools/lumpy-sv/bin/lumpyexpress: line 95: /home/lochol/tools/mypy/bin/python: Permission denied

Manta -- works

barthf@helix180:/projects/barthf/GLASS-WG$ /projects/lochol/code/manta-build/bin/configManta.py

Usage: configManta.py [options]

Version: 1.3.2

Svelter -- no executable directory specified in script you supplied herehttps://github.com/TheJacksonLaboratory/GLASS-WG/blob/master/helix_sv_scripts/svelter-helix.sh. There is no existing helix installation.

CNVnator -- SVE doesn't work for me and I would prefer to run standalone application. CNVnator 0.3 is already installed on helix.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/TheJacksonLaboratory/GLASS-WG/issues/7#issuecomment-406604224, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AF14vO_IR-AJng3ZPNtvEHr_Mia4XLjWks5uId43gaJpZM4U4DwL.

The information in this email, including attachments, may be confidential and is intended solely for the addressee(s). If you believe you received this email by mistake, please notify the sender by return email as soon as possible.

fpbarthel commented 6 years ago

@lslochov can you double check delly release? there is no delly v2 that I can find.

lslochov commented 6 years ago

@fpbarthel You can find Delly 2 here: https://github.com/dellytools/delly

fpbarthel commented 6 years ago

@lslochov what version though?

MalhotraAnkit commented 6 years ago

Isn’t it available through SVE ?

Ankit

fpbarthel commented 6 years ago

SVE doesn't work for me

lslochov commented 6 years ago

@fpbarthel I think you can just pull the latest version. Also, I adjusted the permissions on the paths to the executables I provided. Can you check if you can access my SVE install now? Otherwise, you can just pull directly from the SVE Github repo.

fpbarthel commented 6 years ago

OK lumpyexpress is running on all samples, working fine. However, the output VCF file do not seem to be compatible with bcftools, tabix and GATK! Any suggestions?

When I try to index the output VCF file, I get the following error:

barthf@helix180$ bcftools index -t tmp.vcf.gz
[E::hts_idx_push] Unsorted positions on sequence #1: 31129362 followed by 17684780
index: failed to create index for "tmp.vcf.gz"

The error suggested that perhaps I should sort the VCF file, but when I try that I get the following error:

barthf@helix180$ bcftools sort tmp.vcf.gz
Writing to /tmp/bcftools-sort.V7Ihdb
[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
[W::vcf_parse] Contig '2' is not defined in the header. (Quick workaround: index the file with tabix.)
...
[E::bcf_write] Unchecked error (1), exiting

If I try to index using tabix, I get the following error:

barthf@helix180$ tabix tmp.vcf.gz
[E::hts_idx_push] Unsorted positions on sequence #1: 31129362 followed by 17684780
tbx_index_build failed: tmp.vcf.gz

I then figured, perhaps I need to add contigs back into the VCF, but that gives the following error:

barthf@helix180$ gatk UpdateVCFSequenceDictionary -V tmp.vcf.gz --source-dictionary /projects/verhaak-lab/verhaak_ref/gatk-legacy-bundles/b37/human_g1k_v37_decoy.dict -O tmp.dict.vcf.gz

A USER ERROR has occurred: An index is required but was not found for file drivingVariantFile:/gpfs/ctgs0/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/call/tmp.vcf.gz. Support for unindexed block-compressed files has been temporarily disabled. Try running IndexFeatureFile on the input.

You can guess by now, that using GATK to try and index the file also did not work:

barthf@helix180$ gatk IndexFeatureFile -F tmp.vcf.gz

A USER ERROR has occurred: Error while trying to create index for /gpfs/ctgs0/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/call/tmp.vcf.gz. Error was: htsjdk.tribble.TribbleException.MalformedFeatureFile: Input file is not sorted by start position. 
We saw a record with a start of 1:17684780 after a record with a start of 1:31129362, for input source: /gpfs/ctgs0/fastscratch/verhaak-lab/GLASS-WG/results/lumpy/call/tmp.vcf.gz
fpbarthel commented 6 years ago

Ignore above message, I was able to resolve this by using UpdateVCFSequenceDictionary on the uncompressed VCF

fpbarthel commented 6 years ago

Hi @lslochov , some questions

  1. Is it OK to use CNVnator 0.3.0 instead of 0.3.3, we cannot get the latter version working
  2. How to feed CNVnator results to lumpy for improved calls? Currently we ran lumpyexpress without CNVnator input
  3. Is it OK to use delly 0.7.6 instead of 0.7.8. The former works without problems but the latter is giving a lot of problems. update: seems to work now?
lslochov commented 6 years ago

@fpbarthel From the looks of your error text in running CNVnator 0.3.3, it looks like you just need to install the subprocess32 module. You can run pip install subprocess32 and it should take care of things.

Alternatively, you can try the CNVnator executable directly here:

/home/lochol/code/SVE/src/CNVnator_v0.3.3/src/cnvnator

I'll get back to you on running Lumpy with CNVnator input.

fpbarthel commented 6 years ago

@lslochov writes:

Oh yeah, I found there’s a script in the Lumpy distribution for taking CNVnator output and turning it into a BEDPE file, which you can give to Lumpy with the -bedpe option.

The script is: https://github.com/arq5x/lumpy-sv/blob/master/scripts/cnvanator_to_bedpes.py GitHub arq5x/lumpy-sv lumpy-sv - lumpy: a general probabilistic framework for structural variant discovery

Looks like that option isn’t available on the express version. I guess you’ll have to use the traditional Lumpy.

fpbarthel commented 6 years ago

Pipelines are running as they should. Delly is still slow but that is addressed in another issue. Closing.