Hi @fnothaft -
I'd like to demonstrate joint calling of genotypes using Avocado for a specific genomics regions using the bin "hive-style" partitioned data.
Input:
1) gVCF files for 10+ for 100s of samples saved as the bin range partitioned ADAM parquet datasets
2) bam files saved as ADAM bin partitioned datasets.
The application here I imagine is where there was a desire for on-the-fly recalling of a specific region in a case where new samples are added and a set of candidate regions
need to be examined in near real-time. This would include a feature allowing user to provide a BED file of region to calling, as genotypeGVCFs allows for in GATK/Haplotypecaller.
My plan is to make Avocado be able to load partitioned data from my ADAM "hive" binned dataset branch, and with that I think it will just work, and I'll measure performance.
Let me know if you have suggestions / comments about the usefulness of this.
Hi @fnothaft - I'd like to demonstrate joint calling of genotypes using Avocado for a specific genomics regions using the bin "hive-style" partitioned data. Input: 1) gVCF files for 10+ for 100s of samples saved as the bin range partitioned ADAM parquet datasets 2) bam files saved as ADAM bin partitioned datasets.
The application here I imagine is where there was a desire for on-the-fly recalling of a specific region in a case where new samples are added and a set of candidate regions need to be examined in near real-time. This would include a feature allowing user to provide a BED file of region to calling, as genotypeGVCFs allows for in GATK/Haplotypecaller.
My plan is to make Avocado be able to load partitioned data from my ADAM "hive" binned dataset branch, and with that I think it will just work, and I'll measure performance. Let me know if you have suggestions / comments about the usefulness of this.