Illumina / gvcfgenotyper

A utility for merging and genotyping Illumina-style GVCFs.
Apache License 2.0
32 stars 2 forks source link

providing a list of variants to genotype #9

Open clairepalles opened 5 years ago

clairepalles commented 5 years ago

Is it possible to give gvcfgenotyper a list of chromosome positions to genotype GVCFs at? For example I have two sets of samples joint genotyped using gvcfgenotyper. Some variants are only present in set 1 and I would like to know whether that site could be genotyped in all of the samples in set 2. I am trying to avoid having to re-run gvcfgenotyper on all of the samples combined. Perhaps it is not possible but I wanted to ask?

olest commented 5 years ago

Hi, you can give gvcfgenotyper a list of regions using the -r commandline arg. The syntax is identical to "bcftools view" : https://samtools.github.io/bcftools/bcftools.html#common_options. We don't support the -R option yet, where the regions can be read from a file but it wouldn't be hard to add this if it helps.

clairepalles commented 5 years ago

Thanks for getting back to me

We did try the -r commandline argument which works great when we have been multi sample calling variants on a single chromosome. However when we provide -r chr:position we get a blank VCF back (header only). The positon we are querying is not variant in the set of samples being tested, we know that but we were hoping to get back reference or missing genotypes for the samples whose GVCFs we are providing. If is would be possible to support the -R option that would be great, so long as what we are trying to do is actually possible with gvcfgenotyper?

Thanks again for the help!

Claire


From: notifications@github.com [notifications@github.com] Sent: 09 May 2019 14:28 To: Illumina/gvcfgenotyper Cc: Claire Palles (Institute of Cancer and Genomic Sciences); Author Subject: Re: [Illumina/gvcfgenotyper] providing a list of variants to genotype (#9)

Hi, you can give gvcfgenotyper a list of regions using the -r commandline arg. The syntax is identical to "bcftools view" : https://samtools.github.io/bcftools/bcftools.html#common_options. We don't support the -R option yet, where the regions can be read from a file but it wouldn't be hard to add this if it helps.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Illumina/gvcfgenotyper/issues/9#issuecomment-490904416, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKFR4TOGOPFQLSNY6XLWHLLPUQRGLANCNFSM4HLZCZSA.

olest commented 5 years ago

Hi Claire,

I realized that I did not really understand your problem. Thanks for the clarification.

This is not directly supported by gvcfgenotyper but you could try to "hack" it.

If you are familiar with bcftools, you could use "bcftools view" to slice out the variants that you are interested in and them in a vcf.gz or bcf file. So something like :

bcftools view -r region_around_variants -Ob -o variants.bcf

You could then add this variants.bcf files to the list of input files for gvcfgenotyper and run gvcfgt only on these regions (this is important).

I have not tested this myself but I think this should work.

Adding an option for gvcfgenotyper to force-genotype a set of variants is an interesting idea. It is not straightforward to add. I cannot make any promises but I could add it to the backlog and see if we find time to do it.

clairepalles commented 5 years ago

Thanks for the speedy reply and the suggested hack. We will give that a go. If you could also add the force-genotype option to your list of possible future jobs I would be grateful.

Many thanks

Claire


From: notifications@github.com [notifications@github.com] Sent: 10 May 2019 13:04 To: Illumina/gvcfgenotyper Cc: Claire Palles (Institute of Cancer and Genomic Sciences); Author Subject: Re: [Illumina/gvcfgenotyper] providing a list of variants to genotype (#9)

Hi Claire,

I realized that I did not really understand your problem. Thanks for the clarification.

This is not directly supported by gvcfgenotyper but you could try to "hack" it.

If you are familiar with bcftools, you could use "bcftools view" to slice out the variants that you are interested in and them in a vcf.gz or bcf file. So something like :

bcftools view -r region_around_variants -Ob -o variants.bcf

You could then add this variants.bcf files to the list of input files for gvcfgenotyper and run gvcfgt only on these regions (this is important).

I have not tested this myself but I think this should work.

Adding an option for gvcfgenotyper to force-genotype a set of variants is an interesting idea. It is not straightforward to add. I cannot make any promises but I could add it to the backlog and see if we find time to do it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Illumina/gvcfgenotyper/issues/9#issuecomment-491265136, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKFR4TN43R2NOHRUDVZMP6LPUVQGJANCNFSM4HLZCZSA.

hcurley2 commented 5 years ago

Hello,

I tried this hack as suggested but I get an error explaining that the GVCFs are interrupted (not contiguous) and therefore, it terminates.

You could then add this variants.bcf files to the list of input files for gvcfgenotyper and run gvcfgt only on these regions (this is important). -- I am assuming I may be missing something in the code to ensure that it is only this region? gvcfgenotyper -f path/to/fasta -l listofGVCFpaths.txt -Ob -o genotyped_GVCFs.bcf What if I have variants from multiple regions?

olest commented 5 years ago

Hi,

you can restrict gvcfgenotyper to a region use the -r command line argument. The similar is the same as in bcftools (chrom:start-end).

If you have variants from multiple regions, I recommend cutting out several slices around these variants using the bcftools view command mentioned above and then running several gvcfgenotyper jobs, one for each slice.

ameynert commented 4 years ago

I also need this functionality - both the force output of reference genotypes and the list of input variants. The hack hasn't worked for me either.