chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

ensemble variant calling using bcbio.variation - java error #12

Closed shalabhsuman closed 11 years ago

shalabhsuman commented 11 years ago

Hello,

I am having 3 multi-sample VCF files (through GATK UnifiedGenotyper, FreeBayes, VarScan); and I am trying to test ensemble variant calling using the following command:

java -Xmx32g -jar bcbio.variation-0.1.0-SNAPSHOT-standalone.jar variant-ensemble params.yaml /Resources/Data/genome/hg19_canonical_correct_chr_order.fa ./output/out.vcf gatkUG_baa.vcf freebayes_baa.vcf varscan_baa.vcf 

...and I get the following error (screenshot below): screen shot 2013-08-20 at 9 51 48 am

Note- For params.yaml, I am using the exact same example config file provided on your website here:(https://github.com/chapmanb/bcbio.variation/blob/master/config/ensemble/combine-callers.yaml). Also, I am using the snapshot version of the bcbio.variation jar instead of the standard one.

Please suggest.

Shalabh Suman

chapmanb commented 11 years ago

Shalabh; I've been trying to reproduce this problem without success. Can you share the result of cat params.yaml? The error indicates something is wrong with the YAML configuration file but it's tough to identify from the traceback alone.

shalabhsuman commented 11 years ago

I have used the exact same contents in the config file as provided on your site here: https://github.com/chapmanb/bcbio.variation/blob/master/config/ensemble/combine-callers.yaml

This is how cat params.yamllooks like:

screen shot 2013-08-21 at 10 13 29 am

chapmanb commented 11 years ago

Shalabh; It looks like something happened with the indentation in that file compared to the version on GitHub. Specifically there appears to be a space in front of the --- line. If you edit to remove that, it should work cleanly.

shalabhsuman commented 11 years ago

The job failed again. I have sent you an email with a link to error log file. Thanks.

chapmanb commented 11 years ago

Shalabh; Thanks for passing along the files for testing. The code was not handling no-call genotypes which led to this error. I pushed a new version of 0.1.0-SNAPSHOT that will work cleanly on your input datasets. I took a quick look at the outputs and it looks like work needs to be done to tune to SVM classifier for multi-sample ensemble calling. That is more of a long term project, but hopefully this run will give you consolidate calls to evaluate. Thanks for all the patience getting this running and looking forward to your feedback.