Closed ghost closed 5 years ago
David;
Thanks for the report, it looks like VarDict is producing some problematic output from your analysis. Before we get to the failure, I'm not sure you're configuration is as intended. Do you want to analyze these as a tumor/normal pair? If so, then you need to also supply the same variantcaller
and other options to the normal sample so bcbio knows to batch them together for VarDict. As is, bcbio is running this as tumor-only, which I'm not sure you intended.
If you did want that, we'll need to debug further. Could you try running the vardict-java command outside of bcbio and then feeding the output to teststrandbias.R
:
unset JAVA_HOME && export PATH=/usr/local/share/bcbio/anaconda/bin:$PATH && export VAR_DICT_OPTS='-Xms750m -Xmx3500m -XX:+UseSerialGC' && vardict-java -G /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.01 -N TUMOR -b /home/ubuntu/align/TUMOR/TUMOR-sort.bam -c 1 -S 2 -E 3 -g 4 -Q 10 /home/ubuntu/vardict/1/NEO_E2E_run1-1_93068150_108578394-unmerged-regions-nolcr-regionlimit.bed > raw_vardict.out
cat raw_vardict.out | teststrandbias.R
If that fails and you could provide more details about the output of raw_vardict.out
happy to help debug further to determine what is wrong. Thanks much.
Hi Brad,
thanks for clarification on the yaml configuration. However, I am still receiving the Error: Incorrect input detected in teststrandbias.R
.
new yaml:
---
upload:
dir: /home/ubuntu/project/final
details:
- files: [/home/ubuntu/project/work/sample2_f.fastq.gz, /home/ubuntu/project/work/sample2_r.fastq.gz]
description: TUMOR
metadata:
batch: batch1
phenotype: tumor
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
coverage_interval: regional
mark_duplicates: true
remove_lcr: true
recalibrate: false
realign: false
quality_format: standard
variantcaller:
somatic: [freebayes, vardict, varscan]
germline: [freebayes, vardict, varscan]
variant_regions: /home/ubuntu/project/work/exome.bed
ensemble:
numpass: 2
clinical_reporting: true
min_allele_fraction: 1
effects_transcripts: canonical
- files: [/home/ubuntu/project/work/sample1_f.fastq.gz, /home/ubuntu/project/work/sample1_r.fastq.gz]
description: NORMAL
metadata:
batch: batch1
phenotype: normal
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
coverage_interval: regional
mark_duplicates: true
remove_lcr: true
recalibrate: false
realign: false
quality_format: standard
variantcaller:
somatic: [freebayes, vardict, varscan]
germline: [freebayes, vardict, varscan]
variant_regions: /home/ubuntu/project/work/exome.bed
ensemble:
numpass: 2
clinical_reporting: true
min_allele_fraction: 1
effects_transcripts: canonical
svcaller: cnvkit
fc_date: '20170406'
fc_name: test_run
I followed your instructions and ran
$ unset JAVA_HOME && export PATH=/usr/local/share/bcbio/anaconda/bin:$PATH && export VAR_DICT_OPTS='-Xms750m -Xmx3500m -XX:+UseSerialGC' && vardict-java -G /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.01 -N NORMAL-germline -b /home/ubuntu/project/align/NORMAL/NORMAL-sort.bam -c 1 -S 2 -E 3 -g 4 -Q 10 /home/ubuntu/project/vardict/1/NORMAL-germline-1_93298425_109001698-unmerged-regions-nolcr-regionlimit.bed > raw_Vardict.out
$ head raw_Vardict.out
NORMAL-germline 1 93306206 93306207 C CT 53 3 19 31 2 1 C/+1 0.0566 2;2 26.7 1 38.7 1 60.0 6.000 0.0566 0 9 10.000 1 0.0 3 53 AAAAAGAAGAGGTATGTCGT TTTTTTTTTGTCTTTTCAAG 1:93306087-93306421 Insertion
NORMAL-germline 1 93309641 93309641 C T 95 3 40 52 2 1 C/T 0.0316 2;2 9.0 1 30.3 1 60.0 2.000 0.0217 0 0 3.000 1 0.7 2 92 TTCATTCAGTTGAAGAAGTG CCATGCCGACTTTGCTTCTC 1:93308939-93309802 SNV
NORMAL-germline CCDC18-AS1_Exon1 1 93341983 93341983 C T 108 2 41 65 1 1 C/T 0.0185 2;2 21.5 1 38.0 0 60.0 4.000 0.0187 0 1 1.000 1 1.5 2 107 TCATCCGCACATATGAGAAG GAGCCTGTGAGTAAAAAAGA 1:93341851-93342083 SNV
NORMAL-germline CCDC18-AS1_Exon1 1 93342028 93342028 C A 118 2 24 92 0 2 C/A 0.0169 2;0 20.0 1 39.0 1 60.0 4.000 0.0171 0 2 3.000 1 1.0 2 117 AACGATCATTCATTCAGCAC AAATCAGCATTCAATGTCCT 1:93341851-93342083 SNV
NORMAL-germline CCDC18-AS1_Exon1 1 93342055 93342055 C T 103 2 16 85 1 1 C/T 0.0194 2;2 13.5 1 32.5 1 60.0 4.000 0.0198 0 0 1.000 1 2.0 2 101 GCATTCAATGTCCTAGAACA GACTTCTCACCCTTCTTGGA 1:93341851-93342083 SNV
NORMAL-germline FNBP1L_Exon1 1 93545157 93545157 G A 50 50 0 0 11 39 A/A 1.0000 0;2 37.9 1 36.7 1 60.0 15.667 1.0000 0 2 2.000 1 1.3 47 47 TGGGGGAAGGAGGAAGAAAG AAGCAAGTATAAAAGGGAAA 1:93545032-93545248 SNV
NORMAL-germline BCAR3_Exon1 1 93575912 93575912 G A 73 2 20 51 0 2 G/A 0.0274 2;0 34.5 1 32.5 1 60.0 4.000 0.0274 0 1 1.000 1 1.5 2 73 TAAAGCCAAAAAGCCAGCAT TAAATTTGAAGAGGGTCAGG 1:93575749-93576192 SNV
NORMAL-germline BCAR3_Exon1 1 93580581 93580581 G C 25 25 0 0 9 16 C/C 1.0000 0;2 36.6 1 38.2 1 60.0 50.000 1.0000 0 1 1.000 1 1.3 25 25 TCACACACCTCATATTGATT CAGTGTGATTGATTCAGATG 1:93580506-93580682 SNV
NORMAL-germline BCAR3_Exon1 1 93581024 93581024 T C 57 27 18 12 9 18 T/C 0.4737 2;2 29.8 1 38.4 1 60.0 54.000 0.4737 0 0 1.000 1 1.5 27 57 ACATTTTTTTCCCTCTAATG AGAGGGGTGGTGCACTTAAG 1:93580999-93581192 SNV
NORMAL-germline 1 93602194 93602194 C T 25 25 0 0 16 9 T/T 1.0000 0;2 33.6 1 36.8 1 60.0 24.000 1.0000 0 0 1.000 1 1.2 24 24 CATAGGATGCTAAAAAAGCT AGTAGACAATTCTGGCCTTC 1:93602186-93602554 SNV
Could the issue be the empty value in the second column of the first two rows? cut raw_Vardict.out -f 34
is returning the Insertion/SNV
column so the output does have 34 columns as teststrandbias.R requires. loading the raw_Vardict.out in R with read.table
has dimensions of 413 rows and 34 columns, but running line 8 of teststrandbias.R returns a value of 33 for the first two rows that have empty values in the second column. Any suggestions?
thanks.
David;
Thanks for the details on this. On the practical side, you probably want to use germline focused callers rather than VarDict and VarScan for the germline
specification. You'd be fine here with a single caller and doing germline: [freebayes]
but if you really want 3 callers and have a GATK license (or are an academic lab) I'd use germline: [freebayes, gatk-haplotype, samtools]
.
On the actual VarDict error, I'll have to page @mjafin for this. Miika, do you know what column 2 is meant to be there and why it's empty in some cases?
Thanks, closing as this is a stale issue and we haven't had other reports of it.
I am trying to perform ensemble variant analysis against GRCh37 with Freebayes, Varscan and VarDict. However, there seems to be a hold up on VarDict's inference:
I tried updating vardict with
bcbio_conda install -c bioconda vardict
, but that did not seem to solve the issue. This is the yaml file:thanks in advance.