Closed parlar closed 8 years ago
Pär; Sorry about the problems with VarScan calling. It looks like some process in the pipe fails but it's tough to tell what is wrong as there is not a useful error message (the mpileup message is standard and tell you it's downsampling at 8000 reads).
Re-running the command manually is the best way to do, although as you noticed you have to replace the temporary directories since those are created/removed by bcbio as part of processing. It's a bit tricky with VarScan since bcbio also creates a sample list in the temporary directory: You should be able to do:
/usr/local/bin/samtools mpileup -f /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -d 1000 -L 1000 -l /home/lindak/Data/LiquidBiopsies/TruSight15/160115/calling/work/varscan/7/H1975-B-7_0_55089310-raw-regions.bed /home/lindak/Data/LiquidBiopsies/TruSight15/160115/calling/work/bamprep/H1975-B_tumor/7/H1975-B_tumor-sort-7_0_55089310-prep.bam | grep -v -P '\t0\t\t$' | ifne varscan -Xms681m -Xmx1818m -Duser.language=en -Duser.country=US -XX:+UseSerialGC mpileup2cns --min-coverage 5 --p-value 0.98 --output-vcf --variants | /usr/local/share/bcbio/anaconda/bin/py -x 'bcbio.variation.varscan.fix_varscan_output(x)' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $5) } {print}' | ifne vcfuniqalleles > H1975-B-7_0_55089310-raw.vcf
Depending on if you can get any useful errors from that, you might have to break the line down to see what is happening. Sorry to not have a better way to debug -- happy to try and offer suggestions if you can identify any other error messages that give us a better idea of what is going wrong. Hope this helps.
the mpileup output generated by samtools is empty.
The regions file (H1975-B-7_0_55089310-raw-regions.bed )contains only one row:
7 55089120 55089310
.
But, if I run samtools mpileup on the bam file (H1975-B_tumor/7/H1975-B_tumor-sort-7_0_55089310-prep.bam), I get the output below. The regions don't overlap, so it makes sense I guess. But looking at the bam file using samtools view shows that there are indeed reads there. I guess that they are removed by some default filtering criterion (qual,min cov ... ?).
I suppose that the empty mpileup output must be to blame here? Any way to allow for empty mpileup outputs without resulting in the error?
$ samtools mpileup H1975-B_tumor/7/H1975-B_tumor-sort-7_0_55089310-prep.bam
7 47671587 C 1 ^]. A
7 47671588 T 1 . B
7 47671589 G 1 . A
7 47671590 A 1 . B
7 47671591 G 1 .+1T B
7 47671592 C 1 . F
7 47671593 A 1 . F
7 47671594 G 1 . F
7 47671595 G 1 . F
7 47671596 C 1 . F
7 47671597 C 1 . B
7 47671598 C 1 . G
7 47671599 T 1 . G
7 47671600 T 1 . G
7 47671601 A 1 . G
7 47671602 A 1 . G
7 47671603 C 1 . G
7 47671604 T 1 . G
7 47671605 A 1 . G
7 47671606 G 1 . G
7 47671607 C 1 . G
7 47671608 T 1 . H
7 47671609 C 1 . H
7 47671610 G 1 . G
7 47671611 C 1 . G
7 47671612 C 1 . G
7 47671613 T 1 . G
7 47671614 T 1 . G
7 47671615 C 1 . H
7 47671616 C 1 . H
7 47671617 T 1 . H
7 47671618 G 1 . G
7 47671619 C 1 . H
7 47671620 A 1 . H
7 47671621 G 1 . H
7 47671622 G 1 . H
7 47671623 C 1 . G
7 47671624 A 1 . H
7 47671625 T 1 . H
7 47671626 C 1 . H
7 47671627 T 1 . G
7 47671628 T 1 . H
7 47671629 T 1 . H
7 47671630 T 1 . G
7 47671631 G 1 . H
7 47671632 C 1 . H
7 47671633 A 1 . H
7 47671634 C 1 . H
7 47671635 A 1 . H
7 47671636 C 1 . H
7 47671637 C 1 . G
7 47671638 T 1 . H
7 47671639 G 1 . G
7 47671640 C 1 . H
7 47671641 T 1 . H
7 47671642 G 1 . H
7 47671643 T 1 . H
7 47671644 G 1 . H
7 47671645 T 1 . H
7 47671646 G 1 . H
7 47671647 C 1 . H
7 47671648 C 1 . H
7 47671649 G 1 . G
7 47671650 G 1 . G
7 47671651 G 1 . G
7 47671652 T 1 . 1
7 47671653 G 1 . E
7 47671654 C 1 . ?
7 47671655 T 1 . G
7 47671656 G 1 . G
7 47671657 G 1 . H
7 47671658 T 1 . H
7 47671659 T 1 . H
7 47671660 G 1 . G
7 47671661 A 1 . H
7 47671662 A 1 . E
7 47671663 G 1 . G
7 47671664 A 1 . H
7 47671665 T 1 . H
7 47671666 G 1 . H
7 47671667 C 1 . H
7 47671668 A 1 . H
7 47671669 A 1 . H
7 47671670 C 1 . H
7 47671671 C 1 . H
7 47671672 T 1 . H
7 47671673 C 1 . H
7 47671674 T 1 . H
7 47671675 G 1 . H
7 47671676 A 1 . H
7 47671677 C 1 . H
7 47671678 C 1 . H
7 47671679 T 1 . H
7 47671680 T 1 . H
7 47671681 T 1 . H
7 47671682 A 1 . H
7 47671683 G 1 . H
7 47671684 G 1 . H
7 47671685 G 1 . H
7 47671686 A 1 . H
7 47671687 T 1 . H
7 47671688 T 1 . H
7 47671689 G 1 . H
7 47671690 C 1 . G
7 47671691 A 1 . H
7 47671692 A 1 . H
7 47671693 G 1 . H
7 47671694 C 1 . G
7 47671695 T 1 . H
7 47671696 T 1 . H
7 47671697 G 1 . H
7 47671698 A 1 . H
7 47671699 G 1 . H
7 47671700 C 1 . H
7 47671701 T 1 . H
7 47671702 C 1 . H
7 47671703 G 1 . G
7 47671704 G 1 . G
7 47671705 T 1 . G
7 47671706 G 1 . G
7 47671707 G 1 . G
7 47671708 G 1 . G
7 47671709 G 1 . G
7 47671710 A 1 . G
7 47671711 C 1 . G
7 47671712 A 1 . G
7 47671713 C 1 . H
7 47671714 A 1 . H
7 47671715 G 1 . H
7 47671716 A 1 . H
7 47671717 T 1 . H
7 47671718 G 1 . H
7 47671719 A 1 . H
7 47671720 T 1 . H
7 47671721 A 1 . H
7 47671722 G 1 . e
7 47671723 A 1 . o
7 47671724 T 1 . m
7 47671725 A 1 . n
7 47671726 A 1 . n
7 47671727 A 1 . i
7 47671728 T 1 . Z
7 47671729 A 1 . n
7 47671730 A 1 . i
7 47671731 C 1 . k
7 47671732 A 1 . n
7 47671733 C 1 . o
7 47671734 A 1 . o
7 47671735 G 1 . n
7 47671736 T 1 .$ X
7 47671737 G 1 , H
7 47671738 T 1 , H
7 47671739 G 1 , H
7 47671740 A 1 , H
7 47671741 T 1 , G
7 47671742 A 1 , H
7 47671743 A 1 , B
7 47671744 C 1 , H
7 47671745 C 1 , H
7 47671746 A 1 , H
7 47671747 T 1 , F
7 47671748 C 1 , H
7 47671749 T 1 , H
7 47671750 G 1 , H
7 47671751 G 1 , G
7 47671752 T 1 , H
7 47671753 C 1 , H
7 47671754 A 1 , H
7 47671755 G 1 , H
7 47671756 G 1 , H
7 47671757 G 1 , H
7 47671758 A 1 , H
7 47671759 T 1 , H
7 47671760 G 1 , H
7 47671761 G 1 , G
7 47671762 A 1 , G
7 47671763 A 1 , F
7 47671764 C 1 , H
7 47671765 C 1 , H
7 47671766 T 1 , H
7 47671767 G 1 , G
7 47671768 A 1 , H
7 47671769 T 1 , H
7 47671770 G 1 , H
7 47671771 A 1 , H
7 47671772 G 1 , H
7 47671773 G 1 , H
7 47671774 A 1 , H
7 47671775 A 1 , H
7 47671776 G 1 , H
7 47671777 A 1 , H
7 47671778 A 1 , G
7 47671779 G 1 , H
7 47671780 T 1 , H
7 47671781 G 1 , G
7 47671782 A 1 , F
7 47671783 C 1 , 4
7 47671784 T 1 , H
7 47671785 G 1 , H
7 47671786 G 1 , F
7 47671787 A 1 , H
7 47671788 A 1 , H
7 47671789 T 1 , G
7 47671790 C 1 , G
7 47671791 C 1 , G
7 47671792 T 1 , H
7 47671793 C 1 , F
7 47671794 C 1 , H
7 47671795 T 1 , F
7 47671796 G 1 , E
7 47671797 G 1 , E
7 47671798 G 1 , G
7 47671799 C 1 , H
7 47671800 T 1 , H
7 47671801 G 1 , H
7 47671802 G 1 , H
7 47671803 A 1 , H
7 47671804 A 1 , H
7 47671805 G 1 , G
7 47671806 G 1 , G
7 47671807 T 1 , G
7 47671808 G 1 , G
7 47671809 G 1 , G
7 47671810 G 1 , G
7 47671811 G 1 , H
7 47671812 G 1 , H
7 47671813 A 1 , H
7 47671814 G 1 , F
7 47671815 C 1 , H
7 47671816 T 1 , G
7 47671817 T 1 , G
7 47671818 T 1 , H
7 47671819 T 1 , H
7 47671820 A 1 , H
7 47671821 T 1 , H
7 47671822 G 1 , H
7 47671823 T 1 , H
7 47671824 C 1 , H
7 47671825 T 1 , H
7 47671826 G 1 , H
7 47671827 C 1 , H
7 47671828 A 1 , G
7 47671829 G 1 , H
7 47671830 T 1 , H
7 47671831 C 1 , H
7 47671832 T 1 , H
7 47671833 C 1 , B
7 47671834 T 1 , H
7 47671835 T 1 , G
7 47671836 T 1 , H
7 47671837 G 1 , H
7 47671838 G 1 , F
7 47671839 G 1 , H
7 47671840 T 1 , G
7 47671841 A 1 , G
7 47671842 G 1 , G
7 47671843 G 1 , H
7 47671844 G 1 , H
7 47671845 A 1 , H
7 47671846 A 1 , F
7 47671847 G 1 , G
7 47671848 A 1 , H
7 47671849 A 1 , F
7 47671850 A 1 , H
7 47671851 A 1 , G
7 47671852 T 1 , G
7 47671853 G 1 , G
7 47671854 A 1 , G
7 47671855 A 1 , G
7 47671856 G 1 , C
7 47671857 C 1 , E
7 47671858 A 1 , G
7 47671859 G 1 , C
7 47671860 G 1 , C
7 47671861 G 1 , B
7 47671862 G 1 , F
7 47671863 A 1 , F
7 47671864 G 1 , 4
7 47671865 G 1 , @
7 47671866 C 1 , B
7 47671867 C 1 , A
7 47671868 T 0
7 47671869 T 0
Pär; Thanks for the debugging information. This helped a lot and I think I identified the underlying issue and pushed a fix to the latest development. This should now correctly pass through empty mpileups without failing. Please let me know if you run into any other issues at all.
great, will try asap
Hi Chapmanb,
I updated bcbio pipeline version from 0.9.6 to 0.9.7 to fix this problem, but I guess I still have the same problem as followings.
[2016-05-11T13:23Z] Timing: structural variation initial [2016-05-11T13:23Z] Timing: hla typing [2016-05-11T13:23Z] run local -- checkpoint passed: full [2016-05-11T13:23Z] Timing: alignment post-processing [2016-05-11T13:23Z] multiprocessing: piped_bamprep [2016-05-11T13:23Z] Timing: variant calling [2016-05-11T13:23Z] multiprocessing: variantcall_sample [2016-05-11T13:42Z] Uncaught exception occurred Traceback (most recent call last): File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /project/ibilab/tools/bcbio/anaconda/bin/samtools mpileup -f /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -d 1000 -L 1000 -l /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964-regions.bed /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/bamprep/3918_S8_L001/chr14/3918_S8_L001-sort-chr14_36199824_52492964-prep.bam | ifne grep -v -P '\t0\t\t$' | ifne varscan -Xms681m -Xmx1818m -Duser.language=en -Duser.country=US -XX:+UseSerialGC -Djava.io.tmpdir=/project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/tx/tmphX3aev mpileup2cns --min-coverage 5 --p-value 0.98 --vcf-sample-list /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/tx/tmpeC4VYD/3918_S8_L001-chr14_36199824_52492964-sample_list.txt --output-vcf --variants | /project/ibilab/tools/bcbio/anaconda/bin/py -x 'bcbio.variation.varscan.fix_varscan_output(x)' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $5) } {print}' | ifne vcfuniqalleles > /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/tx/tmpeC4VYD/3918_S8_L001-chr14_36199824_52492964.vcf [mpileup] 1 samples in 1 input files
Sorry about the issue. I'm trying to replicate this by feeding an empty file into the start of the chained post-processing but get an empty file and zero error code so am not sure what part is causing an issue on your run. Would you be able to step through the rest of the command:
| ifne grep -v -P '\t0\t\t$' | ifne varscan -Xms681m -Xmx1818m -Duser.language=en -Duser.country=US -XX:+UseSerialGC -Djava.io.tmpdir=/project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/tx/tmphX3aev mpileup2cns --min-coverage 5 --p-value 0.98 --vcf-sample-list /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/tx/tmpeC4VYD/3918_S8_L001-chr14_36199824_52492964-sample_list.txt --output-vcf --variants | /project/ibilab/tools/bcbio/anaconda/bin/py -x 'bcbio.variation.varscan.fix_varscan_output(x)' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $5) } {print}' | ifne vcfuniqalleles
and add each separate command to the end of the run one at a time, and then echo $?
afterwards to see which command returns a non-zero exit code. If you can identify the underlying cause we can adjust to fix. It's possible the awk commands do something different on your machine with empty input as those are not protected with ifne
.
Thanks much for the help debugging.
Yes. Interestingly, I did not see any error message and also found an empty result file (work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964.vcf) when I run the following script which I copied from the error log (except changing "temp" dir for the result).
/project/ibilab/tools/bcbio/anaconda/bin/samtools mpileup -f /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -d 1000 -L 1000 -l /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964-regions.bed /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/bamprep/3918_S8_L001/chr14/3918_S8_L001-sort-chr14_36199824_52492964-prep.bam | ifne grep -v -P '\t0\t\t$' | ifne varscan -Xms681m -Xmx1818m -Duser.language=en -Duser.country=US -XX:+UseSerialGC -Djava.io.tmpdir=/project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/tx/tmphX3aev mpileup2cns --min-coverage 5 --p-value 0.98 --vcf-sample-list /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/tx/tmpeC4VYD/3918_S8_L001-chr14_36199824_52492964-sample_list.txt --output-vcf --variants | /project/ibilab/tools/bcbio/anaconda/bin/py -x 'bcbio.variation.varscan.fix_varscan_output(x)' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $5) } {print}' | ifne vcfuniqalleles > /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964.vcf
But, it kept throwing error when I run it as a whole pipeline as followings. (I ran it serveral times.)
bcbio_nextgen.py /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/config/ensemble_trim.yaml -t local -n 1
I wonder whether differences on i) running on a whole pipeline and ii) running as a chained single script make any difference on the sensitivity of error. It seems like this error only occurs when it run as a whole pipeline, but it seems to work well running as a chained single script in the same cluster. Thank you!
Thanks much for helping debug this. My guess is that there won't be an error message but one of the subcommands returns a non-zero exit code and a (correct) empty file, so only running and looking at the output won't help identify it.
If you look at the error code (with echo $?
) after running the command externally, is it returning 0 or 1? My guess is 1 based on the bcbio error message, and to debug where this happens you can keep removing commands from the end of the piped process until it returns zero, then we'll identify where the problem occurs and can fix.
Sorry for the debugging process and hopefully this helps identify it.
Got it! I ran the command by adding each separate command to the end of the run one at a time, and did echo $?. I finally found the place that make an error as followings.
$ /project/ibilab/tools/bcbio/anaconda/bin/samtools mpileup -f /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -d 1000 -L 1000 -l /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964-regions.bed /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/bamprep/3918_S8_L001/chr14/3918_S8_L001-sort-chr14_36199824_52492964-prep.bam | ifne grep -v -P '\t0\t\t$' [mpileup] 1 samples in 1 input files
Thanks for debugging and isolating the issue. I'm confused as to what is going on since the ifne
is supposed to protect against this -- it should skip the grep step which raises the non-zero output. What type of system are you running this on? I can try to replicate locally.
To help with debugging, does this give the expected return codes:
touch empty.txt
cat empty.txt | grep -v -P '\t0\t\t$'
echo $? # should be 1
cat empty.txt | ifne grep -v -P '\t0\t\t$'
echo $? # should be 0
Sorry about all the back and forth and not having a good handle about what is going on. Hopefully we can isolate and fix soon to get your analysis running.
Yes. I tried the commands and I have 1 and 0 for the first and the second command respectively, which is expected.
I am using a node from a linux cluster for this analysis and here is the system information.
$ cat /proc/version Linux version 2.6.32-504.el6.x86_64 (mockbuild@x86-023.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Tue Sep 16 01:56:35 EDT 2014
Please let me know if you need any other information. Thank you!
Thanks for following up on this. Now I'm really confused as am not sure why the initial input doesn't get caught by the ifne
check. Could you try doing:
/project/ibilab/tools/bcbio/anaconda/bin/samtools mpileup -f /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -d 1000 -L 1000 -l /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964-regions.bed /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/bamprep/3918_S8_L001/chr14/3918_S8_L001-sort-chr14_36199824_52492964-prep.bam > problem-pileup.txt
cat problem_pileup.txt | ifne grep -v -P '\t0\t\t$'
echo $? # should be 0
If that doesn't give you that, would you be able to upload the problem_pileup.txt
file as a gist (https://gist.github.com/) and I can try to replicate from there. Thanks again.
I got exit code 1 for the following scripts. Please find the file 'problem-pileup.txt' at https://gist.github.com/hyong2000/07613b3155e75dddc0305c0d5d16f142 . If you need any other information, please let me know. Thank you!
/project/ibilab/tools/bcbio/anaconda/bin/samtools mpileup -f /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -d 1000 -L 1000 -l /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/varscan/chr14/3918_S8_L001-chr14_36199824_52492964-regions.bed /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result2/work/bamprep/3918_S8_L001/chr14/3918_S8_L001-sort-chr14_36199824_52492964-prep.bam > problem-pileup.txt cat problem-pileup.txt | ifne grep -v -P '\t0\t\t$' echo $?
I learned that grep returns the exit code ‘1’ if no matches were found. Is there a way to avoid exit code '1' for no matches?
Thanks so much for the debugging details. You're exactly right, the issue was due to removing all reads from this input (which is all empty bases). I pushed a fix to avoid the problem, so if you update to the latest development:
bcbio_nextgen.py upgrade -u development
it should hopefully work correctly for you now. Thanks so much for the patience debugging this and hope it gets your analysis finished.
Great! Thank you! I will let you know whether it works correctly.
I worked correctly for this issue, but the new development version (0.9.8a0) fired other error as followings.
[2016-05-16T17:41Z] Timing: hla typing [2016-05-16T17:41Z] Timing: alignment post-processing [2016-05-16T17:41Z] multiprocessing: piped_bamprep [2016-05-16T23:11Z] Timing: variant calling [2016-05-16T23:11Z] multiprocessing: variantcall_sample [2016-05-17T16:43Z] multiprocessing: concat_variant_files [2016-05-17T16:43Z] Uncaught exception occurred Traceback (most recent call last): File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /project/ibilab/tools/bcbio/anaconda/bin/bcftools concat --allow-overlaps -O z --file-list /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result4/work/mutect/3500_S6_L001-files.list -o /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result4/work/mutect/tx/tmpsKfHk6/3500_S6_L001.vcf.gz Different number of samples in /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result4/work/mutect/chr1/3500_S6_L001-chr1_0_16044553.vcf.gz. Perhaps "bcftools merge" is what you are looking for? ' returned non-zero exit status 255
Is there a way that I can just fix this part based on the stable version of bcbio (0.9.7)?
Sorry about the problem, it appears you might have an older version of bcftools that doesn't support the concat operation. What does:
/project/ibilab/tools/bcbio/anaconda/bin/bcftools --version
report? It should be version 1.3. If you're bcftools is out of date you can update with:
bcbio_nextgen.py upgrade --tools
and hopefully this will get everything working for you.
I have bcftools 1.3. Please see below.
$ /project/ibilab/tools/bcbio/anaconda/bin/bcftools --version bcftools 1.3 Using htslib 1.3 Copyright (C) 2015 Genome Research Ltd. License Expat: The MIT/Expat license This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
Taehyong; Apologies for sending you in the wrong direction. After reading the error again, it looks like something is strange about your MuTect output. It's trying to concatenate together the parallelized variant calls over individual regions but finding that the VCF has different samples in some of the regions. I'm not sure how this would happen, but you could start debugging by looking at the samples trying to be concatenated:
zgrep ^#CHROM `cat /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result4/work/mutect/3500_S6_L001-files.list`
Those should all be identical with the same samples. If that doesn't provide any clues for your process you could post the result of that command and I can try to provide more help. Thanks for the work debugging this.
Thanks for helping!
First thing I want to mention is that this worked before with the previous stable version (0.9.7), but it didn't work after updating bcbio to 0.9.8a0. So something must be happened between these versions.
Anyway, here is the result for the script that you gave me. https://gist.github.com/hyong2000/39e57a84c9fcbd4b4a7ec2054922b805. This generated all of list of the intermediate results for mutect variant calling.
zgrep ^#CHROM cat /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result4/work/mutect/3500_S6_L001-files.list
Now I looked at files of the region (chr1) that error occur and I could not find any problem. (all sample files in the generated list were found in the folder.) Here is the list of file in the folder, /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result4/work/mutect/chr1.
$ ls 3500*vcf.gz 3500_S6_L001-chr1_0_16044553-mutect-orig.vcf.gz 3500_S6_L001-chr1_0_16044553-mutect.vcf.gz 3500_S6_L001-chr1_0_16044553-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_0_16044553-somaticIndels.vcf.gz 3500_S6_L001-chr1_0_16044553.vcf.gz 3500_S6_L001-chr1_109217682_143283354-mutect-orig.vcf.gz 3500_S6_L001-chr1_109217682_143283354-mutect.vcf.gz 3500_S6_L001-chr1_109217682_143283354-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_109217682_143283354-somaticIndels.vcf.gz 3500_S6_L001-chr1_109217682_143283354.vcf.gz 3500_S6_L001-chr1_143767309_159883975-mutect-orig.vcf.gz 3500_S6_L001-chr1_143767309_159883975-mutect.vcf.gz 3500_S6_L001-chr1_143767309_159883975-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_143767309_159883975-somaticIndels.vcf.gz 3500_S6_L001-chr1_143767309_159883975.vcf.gz 3500_S6_L001-chr1_159912052_175645097-mutect-orig.vcf.gz 3500_S6_L001-chr1_159912052_175645097-mutect.vcf.gz 3500_S6_L001-chr1_159912052_175645097-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_159912052_175645097-somaticIndels.vcf.gz 3500_S6_L001-chr1_159912052_175645097.vcf.gz 3500_S6_L001-chr1_16044895_32081736-mutect-orig.vcf.gz 3500_S6_L001-chr1_16044895_32081736-mutect.vcf.gz 3500_S6_L001-chr1_16044895_32081736-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_16044895_32081736-somaticIndels.vcf.gz 3500_S6_L001-chr1_16044895_32081736.vcf.gz 3500_S6_L001-chr1_175734859_192880803-mutect-orig.vcf.gz 3500_S6_L001-chr1_175734859_192880803-mutect.vcf.gz 3500_S6_L001-chr1_175734859_192880803-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_175734859_192880803-somaticIndels.vcf.gz 3500_S6_L001-chr1_175734859_192880803.vcf.gz 3500_S6_L001-chr1_199878716_219943090-mutect-orig.vcf.gz 3500_S6_L001-chr1_199878716_219943090-mutect.vcf.gz 3500_S6_L001-chr1_199878716_219943090-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_199878716_219943090-somaticIndels.vcf.gz 3500_S6_L001-chr1_199878716_219943090.vcf.gz 3500_S6_L001-chr1_220500052_236240522-mutect-orig.vcf.gz 3500_S6_L001-chr1_220500052_236240522-mutect.vcf.gz 3500_S6_L001-chr1_220500052_236240522-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_220500052_236240522-somaticIndels.vcf.gz 3500_S6_L001-chr1_220500052_236240522.vcf.gz 3500_S6_L001-chr1_236245066_249250621-mutect-orig.vcf.gz 3500_S6_L001-chr1_236245066_249250621-mutect.vcf.gz 3500_S6_L001-chr1_236245066_249250621-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_236245066_249250621-somaticIndels.vcf.gz 3500_S6_L001-chr1_236245066_249250621.vcf.gz 3500_S6_L001-chr1_32082041_48267374-mutect-orig.vcf.gz 3500_S6_L001-chr1_32082041_48267374-mutect.vcf.gz 3500_S6_L001-chr1_32082041_48267374-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_32082041_48267374-somaticIndels.vcf.gz 3500_S6_L001-chr1_32082041_48267374.vcf.gz 3500_S6_L001-chr1_48691284_65312469-mutect-orig.vcf.gz 3500_S6_L001-chr1_48691284_65312469-mutect.vcf.gz 3500_S6_L001-chr1_48691284_65312469-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_48691284_65312469-somaticIndels.vcf.gz 3500_S6_L001-chr1_48691284_65312469.vcf.gz 3500_S6_L001-chr1_65323283_81025012-mutect-orig.vcf.gz 3500_S6_L001-chr1_65323283_81025012-mutect.vcf.gz 3500_S6_L001-chr1_65323283_81025012-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_65323283_81025012-somaticIndels.vcf.gz 3500_S6_L001-chr1_65323283_81025012.vcf.gz 3500_S6_L001-chr1_82644031_101324027-mutect-orig.vcf.gz 3500_S6_L001-chr1_82644031_101324027-mutect.vcf.gz 3500_S6_L001-chr1_82644031_101324027-somaticIndels-gatkann.vcf.gz 3500_S6_L001-chr1_82644031_101324027-somaticIndels.vcf.gz 3500_S6_L001-chr1_82644031_101324027.vcf.gz
If you need any other information, please let me know. Thanks!
Taehyong;
Thanks much for the details. It appears as if VCFs where we skipped processing or have no variants get a single sample in the VCF file (3500_S6_L001
) while those that were variant called have two sample names (3500_S6_L001 sample
).
Seeing your output files it looks like you have somatic indel calling going on -- which caller are you using for this? I'm not sure where the generic sample
name is coming from -- is it present in the calls in the somaticIndels.vcf.gz
file? I'm guessing this might be an issue with calling using indelcaller with a tumor-only input so might need more details to help isolate the problem.
Hi Chapman,
For mutect, indel calling was done by scalpel since mutect does not have indel variant calling function. Please see my configuration yaml file for this. (But, as I said, it worked well with the previous version (0.9.7) and this configuration.)
Let me know if you need other information. Thanks!
Thanks for the details. It looks like scalpel is now using sample
for the default name in the FORMAT line when running with a single sample (non-paired). I pushed a fix to correctly translate this to the correct sample name after analysis. If you update to the latest development, you'll need to remove the mutect directory and re-run that analysis and it should give you correct VCFs now. Sorry about the problem and hope this gets your analysis finished.
Got it! I will update to the latest development and try it again. Thank you!
Hi Chapman, Everything worked correctly now. Thank you so much for your help!!!
Hello! It looks like that I have a similar issue with different samples (reads) now. Please see my debug-log below. I am using bcbio version 0.9.8a0 (bcbio_nextgen.py -v).
########################################################################## [2016-07-07T13:54Z] ..##### ERROR ------------------------------------------------------------------------------------------ [2016-07-07T13:54Z] ##### ERROR stack trace [2016-07-07T13:54Z] java.lang.IllegalStateException: Key ADP found in VariantContext field INFO at chr1:3300162 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default. [2016-07-07T13:54Z] at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:177) [2016-07-07T13:54Z] at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:115) [2016-07-07T13:54Z] at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:222) [2016-07-07T13:54Z] at org.broadinstitute.gatk.tools.CatVariants.execute(CatVariants.java:296) [2016-07-07T13:54Z] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) [2016-07-07T13:54Z] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) [2016-07-07T13:54Z] at org.broadinstitute.gatk.tools.CatVariants.main(CatVariants.java:312) [2016-07-07T13:54Z] ##### ERROR ------------------------------------------------------------------------------------------ [2016-07-07T13:54Z] ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-21-g80158d2): [2016-07-07T13:54Z] ##### ERROR [2016-07-07T13:54Z] ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. [2016-07-07T13:54Z] ##### ERROR If not, please post the error message, with stack trace, to the GATK forum. [2016-07-07T13:54Z] ##### ERROR Visit our website and forum for extensive documentation and answers to [2016-07-07T13:54Z] ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk [2016-07-07T13:54Z] ##### ERROR [2016-07-07T13:54Z] ##### ERROR MESSAGE: Key ADP found in VariantContext field INFO at chr1:3300162 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default. [2016-07-07T13:54Z] ##### ERROR ------------------------------------------------------------------------------------------ [2016-07-07T13:54Z] bcftools concat variants [2016-07-07T13:54Z] Different number of samples in /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/chr1/4615_S3-chr1_0_16221490.vcf.gz. Perhaps "bcftools merge" is what you are looking for? [2016-07-07T13:54Z] Uncaught exception occurred Traceback (most recent call last): File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /project/ibilab/tools/bcbio/anaconda/bin/bcftools concat --allow-overlaps -O z --file-list /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/4615_S3-files.list -o /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/tx/tmp0lokL4/4615_S3.vcf.gz Different number of samples in /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/chr1/4615_S3-chr1_0_16221490.vcf.gz. Perhaps "bcftools merge" is what you are looking for? ' returned non-zero exit status 255 ######################################################################
So, I tried to run the following on my command prompt to check whether it works or not.
/project/ibilab/tools/bcbio/anaconda/bin/bcftools concat --allow-overlaps -O z --file-list /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/4615_S3-files.list -o /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/testVarscan/4615_S3.vcf.gz Different number of samples in /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/chr1/4615_S3-chr1_0_16221490.vcf.gz. Perhaps "bcftools merge" is what you are looking for?
I have no result and no detailed error message. I have no idea what to do next. Could you help me out? Thank you!
I found a clue for this error from the error message below.
I checked a command for this error.
/project/ibilab/tools/bcbio/anaconda/bin/gatk-framework org.broadinstitute.gatk.tools.CatVariants -R /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -V /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/4615_S3-files.list -out /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/testVarscan/4615_S3.vcf.gz -assumeSorted -Xms750m -Xmx2000m -XX:+UseSerialGC
To find the VCF file that generates the error, I checked the list of variants calling files from /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/4615_S3-files.list. In the list, I manually checked files from the beginning and I found that the very first file does not include VCF headers nor variants (shown below).
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4615_S3
Other than this file, they include VCF headers and variants as followings.
##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=ADP,Number=1,Type=Integer,Description="Average per-sample depth of bases with Phred score >= 15">
##INFO=<ID=WT,Number=1,Type=Integer,Description="Number of samples called reference (wild-type)">
##INFO=<ID=HET,Number=1,Type=Integer,Description="Number of samples called heterozygous-variant">
##INFO=<ID=HOM,Number=1,Type=Integer,Description="Number of samples called homozygous-variant">
##INFO=<ID=NC,Number=1,Type=Integer,Description="Number of samples not called">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Raw Read Depth as reported by SAMtools">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Quality Read Depth of bases with Phred score >= 15">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=Float,Description="Variant allele frequency">
##FORMAT=<ID=PVAL,Number=1,Type=String,Description="P-value from Fisher's Exact Test">
##FORMAT=<ID=RBQ,Number=1,Type=Integer,Description="Average quality of reference-supporting bases (qual1)">
##FORMAT=<ID=ABQ,Number=1,Type=Integer,Description="Average quality of variant-supporting bases (qual2)">
##FORMAT=<ID=RDF,Number=1,Type=Integer,Description="Depth of reference-supporting bases on forward strand (reads1plus)">
##FORMAT=<ID=RDR,Number=1,Type=Integer,Description="Depth of reference-supporting bases on reverse strand (reads1minus)">
##FORMAT=<ID=ADF,Number=1,Type=Integer,Description="Depth of variant-supporting bases on forward strand (reads2plus)">
##FORMAT=<ID=ADR,Number=1,Type=Integer,Description="Depth of variant-supporting bases on reverse strand (reads2minus)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4615_S3
chr1 3300162 . C T 0 PASS ADP=22;HET=1;HOM=0;NC=0;WT=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:6:22:22:20:2:9.09%:2.4419E-1:16:15:20:0:2:0
For a test, I added the VCF header and a fake variant into the first VCF file and I found that the command worked well! So, my guess is that there should not be an empty variant file in the list of the variant file at /project/ibilab/projects/Martin_Carroll_DNAseqOfAML_Anne_2016_2/large_data/bcbio-result/work/varscan/4615_S3-files.list. Please fix this problem and let me know if you have any question. Thank you!
Taehyong;
Thank you for the report and all the work tracking down the underlying issue. To fix this it looks like we'd need to add a call to vcf_has_variants
(https://github.com/chapmanb/bcbio-nextgen/blob/abfa4b323af5d70cf3fde7e7c77308c2646b6216/bcbio/variation/vcfutils.py#L294) and remove files from the merge list that have no variants here:
If you feel up for testing that fixes your issue and sending in a pull request I'd be happy to merge it. Thank you again for all the work tracking down the underlying issue.
Sure! I would like to try it whether it fixes my issue. Should I just upgrade bcbio and rerun it against my samples?
bcbio_nextgen.py upgrade -u development
Taehyong -- sorry for the confusion, I was trying to give details hoping you might want to fix and provide a pull request yourself. Happy to also take care of this myself and I pushed a fix now which should resolve it. If you update from development as you describe above you should be able to test. Hope this gets things working for you.
Thank you! But I have similar problem for mutect2. When there is no called variant in a certain region of a chromosome, it makes the problem. I think it would be better to double-check codes of merging steps across different variant callers. Please find the error trace below. Thank you!!
[2016-07-17T14:22Z] Concat variant files
[2016-07-17T14:22Z] INFO 10:22:45,079 HelpFormatter - -------------------------------------------------------
[2016-07-17T14:22Z] INFO 10:22:45,081 HelpFormatter - Program Name: org.broadinstitute.gatk.tools.CatVariants
[2016-07-17T14:22Z] INFO 10:22:45,085 HelpFormatter - Program Args: -R /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -V /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/720-files.list -out /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/tx/tmp_O87mm/720.vcf.gz -assumeSorted
[2016-07-17T14:22Z] INFO 10:22:45,091 HelpFormatter - Executing as taehyong@node119.hpc.local on Linux 2.6.32-504.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b15.
[2016-07-17T14:22Z] INFO 10:22:45,092 HelpFormatter - Date/Time: 2016/07/17 10:22:45
[2016-07-17T14:22Z] INFO 10:22:45,092 HelpFormatter - -------------------------------------------------------
[2016-07-17T14:22Z] INFO 10:22:45,092 HelpFormatter - -------------------------------------------------------
[2016-07-17T14:22Z] .........10.........20.........30.........40.........50.........60.........70.........80.........90.........100.........110.........120.........130.........140.........150.........160.........170.##### ERROR ------------------------------------------------------------------------------------------
[2016-07-17T14:22Z] ##### ERROR stack trace
[2016-07-17T14:22Z] htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/chr17/720-chr17_78458800_81195210.vcf.gz
[2016-07-17T14:22Z] at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:90)
[2016-07-17T14:22Z] at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:74)
[2016-07-17T14:22Z] at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:85)
[2016-07-17T14:22Z] at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:66)
[2016-07-17T14:22Z] at org.broadinstitute.gatk.tools.CatVariants.getFeatureReader(CatVariants.java:188)
[2016-07-17T14:22Z] at org.broadinstitute.gatk.tools.CatVariants.execute(CatVariants.java:280)
[2016-07-17T14:22Z] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
[2016-07-17T14:22Z] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
[2016-07-17T14:22Z] at org.broadinstitute.gatk.tools.CatVariants.main(CatVariants.java:312)
[2016-07-17T14:22Z] Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
[2016-07-17T14:22Z] at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
[2016-07-17T14:22Z] at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:88)
[2016-07-17T14:22Z] at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:41)
[2016-07-17T14:22Z] at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:88)
[2016-07-17T14:22Z] ... 8 more
[2016-07-17T14:22Z] ##### ERROR ------------------------------------------------------------------------------------------
[2016-07-17T14:22Z] ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-21-g80158d2):
[2016-07-17T14:22Z] ##### ERROR
[2016-07-17T14:22Z] ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
[2016-07-17T14:22Z] ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
[2016-07-17T14:22Z] ##### ERROR Visit our website and forum for extensive documentation and answers to
[2016-07-17T14:22Z] ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
[2016-07-17T14:22Z] ##### ERROR
[2016-07-17T14:22Z] ##### ERROR MESSAGE: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/chr17/720-chr17_78458800_81195210.vcf.gz
[2016-07-17T14:22Z] ##### ERROR ------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/project/ibilab/tools/bcbio/tool/bin/bcbio_nextgen.py", line 226, in <module>
main(**kwargs)
File "/project/ibilab/tools/bcbio/tool/bin/bcbio_nextgen.py", line 43, in main
run_main(**kwargs)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main
fc_dir, run_info_yaml)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 151, in variant2pipeline
samples = genotype.parallel_variantcall_region(samples, run_parallel)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/genotype.py", line 176, in parallel_variantcall_region
"variantcall_sample", "concat_variant_files",
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 35, in grouped_parallel_split_combine
parallel_fn(combine_name, combine_args)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 800, in __call__
while self.dispatch_one_batch(iterator):
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 180, in __init__
self.results = batch()
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
return apply(f, *args, **kwargs)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 199, in concat_variant_files
def variantcall_sample(*args):
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/vcfutils.py", line 301, in concat_variant_files
out_handle.write(fname + "\n")
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/project/ibilab/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command '/project/ibilab/tools/bcbio/anaconda/bin/gatk-framework org.broadinstitute.gatk.tools.CatVariants -R /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -V /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/720-files.list -out /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/tx/tmp_O87mm/720.vcf.gz -assumeSorted -Xms750m -Xmx2000m -XX:+UseSerialGC
INFO 10:22:45,079 HelpFormatter - -------------------------------------------------------
INFO 10:22:45,081 HelpFormatter - Program Name: org.broadinstitute.gatk.tools.CatVariants
INFO 10:22:45,085 HelpFormatter - Program Args: -R /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -V /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/720-files.list -out /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/tx/tmp_O87mm/720.vcf.gz -assumeSorted
INFO 10:22:45,091 HelpFormatter - Executing as taehyong@node119.hpc.local on Linux 2.6.32-504.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b15.
INFO 10:22:45,092 HelpFormatter - Date/Time: 2016/07/17 10:22:45
INFO 10:22:45,092 HelpFormatter - -------------------------------------------------------
INFO 10:22:45,092 HelpFormatter - -------------------------------------------------------
.........10.........20.........30.........40.........50.........60.........70.........80.........90.........100.........110.........120.........130.........140.........150.........160.........170.##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/chr17/720-chr17_78458800_81195210.vcf.gz
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:90)
at htsjdk.tribble.TabixFeatureReader.<init>(TabixFeatureReader.java:74)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:85)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:66)
at org.broadinstitute.gatk.tools.CatVariants.getFeatureReader(CatVariants.java:188)
at org.broadinstitute.gatk.tools.CatVariants.execute(CatVariants.java:280)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.tools.CatVariants.main(CatVariants.java:312)
Caused by: htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:88)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:41)
at htsjdk.tribble.TabixFeatureReader.readHeader(TabixFeatureReader.java:88)
... 8 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-21-g80158d2):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/chr17/720-chr17_78458800_81195210.vcf.gz
##### ERROR ------------------------------------------------------------------------------------------
' returned non-zero exit status 1
Taehyong -- sorry about the MuTect2 problems. We haven't seen this in testing so I'm not sure what the underlying issue is. From the errors, it looks like something is problematic about one of the input files:
/project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/chr17/720-chr17_78458800_81195210.vcf.gz
Would it be possible to share that file with us? Understanding what went wrong would help us either work around it or submit an issue upstream to the MuTect2 team. Thanks much for the help debugging.
Thanks for looking this! 720-chr17_78458800_81195210.vcf.gz is an empty file. I don't think it is a problem of MuTect2 but it is a problem when variant result files were concatenated with an empty result file, which create an error due to no header or variants in it (an empty file). Please see the error message.
ERROR MESSAGE: Unable to parse header with error: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file, for input source: /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/chr17/720-chr17_78458800_81195210.vcf.gz
If you need any other information, please let me know. Thank you!!
Taehyong, thanks for the quick followup with details about that file. The underlying issue is that MuTect2 is producing an empty file here -- it should produce a VCF header with no variants, not a completely empty file. The concatenate variants fails on this, but the real error is upstream. As an aside, do you have the latest development version (or stable release from this morning)? That should also skip over this and the traceback looks like it might be from an older version.
Practically, it would be helpful to see if you can reproduce the GATK MuTect2 empty file error outside of bcbio. If you do:
grep MuTect2 log/bcbio-variation-commands.log | grep 720-chr17_78458800_81195210.vcf.gz
You hopefully can find the initial command that created this. If you re-run that, can you replicate the creation of an empty file? If so, you might want to upgrade to GATK 3.6 (also supported in the latest development) and see if that also has a problem. If so, we should try to prepare and file a bug upstream to the GATK team.
Thanks again for the help debugging.
Thanks for quick reply! I sort-of agree on this. But please consider what I thought to fix the problem. (Please also check the discussion above of this thread since this can be solved with the same way).
As you can see in the following command line, CatVariants tries to concatenate variants result files from the list, '/project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/720-files.list', which includes the empty file, 720-chr17_78458800_81195210.vcf.gz. So, I think that it would be better to check whether variant results file has actual variants or not (empty or not) when 720-files.list is created. This could be a better way to avoid error (concatenating an empty file) regardless of generating empty files from GATK.
/project/ibilab/tools/bcbio/anaconda/bin/gatk-framework org.broadinstitute.gatk.tools.CatVariants -R /project/ibilab/tools/bcbio/genomes/Hsapiens/hg19/seq/hg19.fa -V /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/720-files.list -out /project/ibilab/projects/Devraj_Basu_HNSCC_Devraj_2016_3/large_data/bcbio-result5/work/mutect2/tx/tmp_O87mm/720.vcf.gz -assumeSorted -Xms750m -Xmx2000m -XX:+UseSerialGC
Taehyong -- I definitely agree with you that it should catch this during the concatenation, and the fix I put in place earlier for the other problem should also avoid using empty files. Is it possible you don't have the latest development version with this fix in place? Hopefully upgrading to the latest development will skip over this if you re-try.
Right! I upgraded to 0.9.9 and it works! Thank you so much!
I believe that I'm getting the same error as https://github.com/chapmanb/bcbio-nextgen/issues/1154 ,see traceback below.
Running
gives this output:
I notice that the temporary directory
.../tmpxlP4EC/..
, where the vcf output should go, does not exist, i.e./home/lindak/Data/LiquidBiopsies/TruSight15/160115/calling/work/varscan/7/tx/
is empty. Could this be the source of the problem?