chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

bcbio.variation using Lein #11

Closed shalabhsuman closed 11 years ago

shalabhsuman commented 11 years ago

Hello,

I am uncertain how to correctly install and use bcbio.variation as a library from Leiningen.

I have gone through the documentation, but it didn't help much.

I have installed the lein shell script inside /bin, but not sure how to proceed after that...in other words, how to start using bcbio.variation "0.0.8" (what particular command to be used to correctly pull the package and dependencies)??

Thanks

Shalabh Suman

chapmanb commented 11 years ago

Shalabh; Perhaps it would help to understand your specific goal with bcbio.variation. The leiningen route is meant to use it as a library from other Clojure code. Is that what you're trying to do, or use some particular functionality?

shalabhsuman commented 11 years ago

I am trying to install bcbio.variation, so I can test it as a standalone package on multiple VCF files and generate the ensemble results in a single VCF file. I have already installed the bcbio.variation-0.0.9-standalone.jar, however I was not clear how to start using the jar for ensemble process.

On the secondary note, if its possible, I would also like to learn how to use the bcbio.variation as a library from other Clojure code via the leiningen route.

chapmanb commented 11 years ago

Shalabh; Thanks much for the explanation. I wanted to be sure you had the same goals we'd discussed. I put together an initial interface along the lines of our e-mail conversation. Documentation on usage is here:

https://github.com/chapmanb/bcbio.variation#ensemble-variant-calls

This requires a snapshot release of the code from today available here:

https://s3.amazonaws.com/bcbio.variation/bcbio.variation-0.1.0-SNAPSHOT-standalone.jar

This provides a proof of concept but will eventually need additional work for your use case:

Hope this helps to get started with.

shalabhsuman commented 11 years ago

Thanks for the response.

Right now we are generating multi-sample VCF files using 2 methods (GATK UnifiedGenotyper & FreeBayes), but since you suggested 3-approach combinations, I would generate a third multi-sample VCF file using GATK HaplotypeCaller, so I can test the ensemble variant calling using bcbio.variation-0.1.0-SNAPSHOT-standalone.jar.

Let me run a ensemble test using these 3 VCF files' combination. I will keep you in the loop.

Thank you

Shalabh Suman

shalabhsuman commented 11 years ago

Hello,

So finally I have been able to arrange 3 multi-sample VCF files (GATK UnifiedGenotyper, FreeBayes, VarScan); and I am trying to test ensemble variant calling using the following command:

java -Xmx32g -jar bcbio.variation-0.1.0-SNAPSHOT-standalone.jar variant-ensemble params.yaml /mnt/nfs/gigantor/ifs/DCEG/CGF/Resources/Data/genome/hg19_canonical_correct_chr_order.fa ./output/out.vcf gatkUG_baa.vcf freebayes_baa.vcf varscan_baa.vcf 

I get the following error (screenshot below): screen shot 2013-08-20 at 9 51 48 am

PS: For params.yaml, I am using the exact same example config file provided on your website (https://github.com/chapmanb/bcbio.variation/blob/master/config/ensemble/combine-callers.yaml). Also, I am using the snapshot version of the jar instead of the standard one.

Please suggest.

Shalabh Suman