Open Alexander-Stuckey opened 5 years ago
Hi, I'd like to help with this -- it's usually a little bit of a project to get all the details right and to calibrate the quantitative thresholds. (The Strelka2 config itself is super rough.) Do you have any less-sensitive test article gVCF files that could be shared for that purpose (publicly or privately)? You're welcome to email me, mlin at dnanexus.com if you'd like to set up a call to discuss further!
I'll see if I can wrangle some test gVCFs that I can share, I unfortunately can't share the ones that we have internally.
I'm trialing using GLnexus to merge gVCFs produced by the isaac/starling workflow from Illumina, as an alternative to using gvcfgenotyper (as gvcfgenotyper is extremely memory heavy).
I know it's not a supported configuration, but I have had some success using the yaml for Strelka (with some minor modifications, changing the allele_dp_format field from min_dp to dp).
This does work, and produce a merged gVCF, but I've noticed the following issues: Output format fields for AD,GQ can have . as an entry, instead of an int value. If the value for min_GQ is set higher than min_AQ1 and min_AQ2 then every variant is filtered out.
The second issue is easy to work around, but the first is presenting problems when trying to use the produced gVCF downstream, for example importing it into Hail throws an error, since it expects ints.
Is this something that you would be able to help / advise with?
Below is the full config file that I am using:
unifier_config:
drop_filtered: false
min_allele_copy_number: 1
min_AQ1: 0
min_AQ2: 0
min_GQ: 0
max_alleles_per_site: 0
monoallelic_sites_for_lost_alleles: false
preference: common
genotyper_config:
revise_genotypes: false
min_assumed_allele_frequency: 0.0001
required_dp: 0
allow_partial_data: false
allele_dp_format: AD
ref_dp_format: DP
output_residuals: false
squeeze: false
output_format: BCF
liftover_fields:
- {orig_names: [MIN_DP, DP, DPI], name: DP, description: "##FORMAT=<ID=DP,Number=1,Type=Integer,Description=\"Approximate read depth (reads with MQ=255 or with bad mates are filtered)\">", type: int, number: basic, default_type: missing, count: 1, combi_method: min, ignore_non_variants: true}
- {orig_names: [AD], name: AD, description: "##FORMAT=<ID=AD,Number=.,Type=Integer,Description=\"Allelic depths for the ref and alt alleles in the order listed\">", type: int, number: alleles, default_type: zero, count: 0, combi_method: min, ignore_non_variants: true}
- {orig_names: [GQ], name: GQ, description: "##FORMAT=<ID=GQ,Number=1,Type=Integer,Description=\"Genotype Quality\">", type: float, number: basic, default_type: missing, count: 1, combi_method: min, ignore_non_variants: true}
- {orig_names: [FILTER], name: FT, description: "##FORMAT=<ID=FT,Number=1,Type=String,Description=\"FILTER field from sample gVCF\">", type: string, number: basic, default_type: missing, count: 1, combi_method: missing, ignore_non_variants: true}