brentp / duphold

don't get DUP'ed or DEL'ed by your putative SVs.
MIT License
101 stars 9 forks source link

Couldn't find sample from bam or ENV in vcf #34

Closed moldach closed 4 years ago

moldach commented 4 years ago

I'm getting an error when trying to use duphold

~/projects/lumpy$ duphold --vcf maddog_sorted.vcf --bam maddog_bam_trim_bwaMEM_sort_dedupped.bam -f ~/projects/data/celegans/c_elegans.PRJNA13758.WS265.genomic.fa 
couldn't find sample from bam:maddog_bam or ENV in vcf which had:maddog

I try to set DUPHOLD_SAMPLE_NAME environment variable but still get an error:

~/projects/umpy$ DUPHOLD_SAMPLE_NAME=maddog
~/projects/lumpy$ duphold --vcf maddog_sorted.vcf --bam maddog_bam_trim_bwaMEM_sort_dedupped.bam -f ~/projects/data/celegans/c_elegans.PRJNA13758.WS265.genomic.fa 
couldn't find sample from bam:maddog_bam or ENV in vcf which had:maddog
brentp commented 4 years ago

try: export DUPHOLD_SAMPLE_NAME=maddog and: export DUPHOLD_SAMPLE_NAME=maddog_bam

moldach commented 4 years ago

Hi @brentp thanks for the quick response.

export DUPHOLD_SAMPLE_NAME=maddog did the trick.

I have a number of VCF files with different names so just always set it to what's in the VCF file, e.g.:

couldn't find sample from bam:maddog_bam or ENV in vcf which had:maddog_trim_bwaMEM_sort_dedupped

Set export DUPHOLD_SAMPLE_NAME=maddog_trim_bwaMEM_sort_dedupped

brentp commented 4 years ago

yes. if there's no match between what's in the VCF samples and the bam/cram read-groups, you must set the name manually with that env variable.

moldach commented 4 years ago

How would I deal with a case like:

couldn't find sample from bam:maddog_bam or ENV in vcf which had:

I've tried export DUPHOLD_SAMPLE_NAME='' but it doesn't work.

brentp commented 4 years ago

your VCF has no samples? you'll need to have a VCF with samples in order to use duphold.

moldach commented 4 years ago

I'm confused at to why the VCF would have no samples?

I've got a call set from Manta here as an example (with dummy data for the BND)

##fileformat=VCFv4.1
      2 ##fileDate=20200313
      3 ##source=GenerateSVCandidates 1.6.0
      4 ##reference=file:///WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa
      5 ##contig=<ID=I,length=15072434>
      6 ##contig=<ID=II,length=15279421>
      7 ##contig=<ID=III,length=13783801>
      8 ##contig=<ID=IV,length=17493829>
      9 ##contig=<ID=V,length=20924180>
     10 ##contig=<ID=X,length=17718942>
     11 ##contig=<ID=MtDNA,length=13794>
     12 ##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
     13 ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
     14 ##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
     15 ##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
     16 ##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS">
     17 ##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END">
     18 ##INFO=<ID=CIGAR,Number=A,Type=String,Description="CIGAR alignment for each alternate indel allele">
     19 ##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakend">
     20 ##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
     21 ##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical homology at event breakpoints">
     22 ##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical homology at event breakpoints">
     23 ##INFO=<ID=SVINSLEN,Number=.,Type=Integer,Description="Length of insertion">
     24 ##INFO=<ID=SVINSSEQ,Number=.,Type=String,Description="Sequence of insertion">
     25 ##INFO=<ID=LEFT_SVINSSEQ,Number=.,Type=String,Description="Known left side of insertion for an insertion of unknown length">
     26 ##INFO=<ID=RIGHT_SVINSSEQ,Number=.,Type=String,Description="Known right side of insertion for an insertion of unknown length">
     27 ##INFO=<ID=PAIR_COUNT,Number=1,Type=Integer,Description="Read pairs supporting this variant where both reads are confidently mapped">
     28 ##INFO=<ID=BND_PAIR_COUNT,Number=1,Type=Integer,Description="Confidently mapped reads supporting this variant at this breakend (mapping may not be confident at remote breakend)">
     29 ##INFO=<ID=UPSTREAM_PAIR_COUNT,Number=1,Type=Integer,Description="Confidently mapped reads supporting this variant at the upstream breakend (mapping may not be confident at downstream breakend)">
     30 ##INFO=<ID=DOWNSTREAM_PAIR_COUNT,Number=1,Type=Integer,Description="Confidently mapped reads supporting this variant at this downstream breakend (mapping may not be confident at upstream breaken>
     31 ##ALT=<ID=DEL,Description="Deletion">
     32 ##ALT=<ID=INS,Description="Insertion">
     33 ##ALT=<ID=DUP:TANDEM,Description="Tandem Duplication">
     34 ##cmdline=/home/tamaroi/bin/configManta.py --bam /home/tamaroi/scratch/work/strains/maddog/alignment/bwa/maddog_trim_bwaMEM_sort_dedupped.bam --referenceFasta /home/tamaroi/projects/def-mtarailo>
     35 #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
     36 I       111111  MantaBND:1:0:1:0:0:0:0  A       ]II:111111]A   .       .       SVTYPE=BND;MATEID=MantaBND:1:0:1:0:0:0:1;IMPRECISE;CIPOS=-153,154;BND_PAIR_COUNT=6;PAIR_COUNT=6