chapmanb / bcbio.variation

Toolkit to analyze genomic variation data, built on the GATK with Clojure
66 stars 15 forks source link

Handling of multi-sample vcf #17

Closed a113n closed 10 years ago

a113n commented 10 years ago

Hi Brad,

I am not sure if "variant-prep" supports the normalization of multi-sample VCF. I tried to run "variant-prep" on a multi-sample VCF generated by GATK 3.0, however only the genotype calls of the first sample exist in the output.

It looks like the culprit lies in the SelectVariants step, where only the first sample is specified in the parameter "--sample_name". Here attached the output excerpt:

Cleaning input VCF: fullprep
Merging multiple input files: fullprep
Prepare VCF, resorting to genome build: fullprep
INFO  15:54:03,756 HelpFormatter - --------------------------------------------------------------------------------
INFO  15:54:03,757 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.0-6-g9909b24, Compiled 2014/03/08 10:08:34
INFO  15:54:03,757 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  15:54:03,758 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  15:54:03,761 HelpFormatter - Program Args: -T SelectVariants --read_filter BadCigar --read_filter NotPrimaryAlignment -R human_g1k_v37_decoy.fasta --sample_name sample1 --variant mutations.vcf --unsafe ALL --out sample1-fullprep.vcf --excludeNonVariants --excludeFiltered

FYI, I am using bcbio.variation v0.1.7-SNAPSHOT-20140518. I hope you can give me some hints on this issue, thank you very much!

Best regards, Allen

chapmanb commented 10 years ago

Allen; Thanks much for the report and apologies about the issue. I pushed a fix that will avoid the subset when more than one sample is present. This allows multi-sample inputs into variant-prep and will hopefully do what you need. I created a new snapshot if you have time to test and we'll roll this into the next release:

https://github.com/chapmanb/bcbio.variation/releases/download/v0.1.7-SNAPSHOT-20140528/bcbio.variation-0.1.7-SNAPSHOT-standalone.jar

Thanks again.