dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

Why was this site rejected #226

Open ghost opened 4 years ago

ghost commented 4 years ago

Hello,

from Deepvariant, in one gVCF I have the following

Chrom_3 7768199 .   C   T,<*>   6.6 PASS    .   GT:GQ:DP:AD:VAF:PL  0/1:7:81:55,24,0:0.296296,0:5,0,48,990,990,990

However, it doesn't seem to have been kept in the final pVCF. I suppose it comes from the ratio of AQ (computed from PL, right)?

In that case, for DeepVariant WGS, would setting

min_AQ1: 0 min_AQ2: 0

Have the effect of keeping all variants? If I understand well, the preset for Deepvariant WGS are just to filter on AQ. Therefore I tried a custom yaml

cat config_custom.yml
 unifier_config:
        min_AQ1: 0
        min_AQ2: 0
        min_GQ: 0
        monoallelic_sites_for_lost_alleles: true
        max_alleles_per_site: 32
    genotyper_config:
        required_dp: 0
        revise_genotypes: true
        allow_partial_data: true
        more_PL: true
        trim_uncalled_alleles: true
        liftover_fields:
            - orig_names: [MIN_DP, DP]
              name: DP
              description: '##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">'
              type: int
              combi_method: min
              number: basic
              count: 1
              ignore_non_variants: true
            - orig_names: [AD]
              name: AD
              description: '##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">'
              type: int
              number: alleles
              combi_method: min
              default_type: zero
              count: 0
            - orig_names: [GQ]
              name: GQ
              description: '##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">'
              type: int
              number: basic
              combi_method: min
              count: 1
              ignore_non_variants: true
            - orig_names: [PL]
              name: PL
              description: '##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled genotype Likelihoods">'
              type: int
              number: genotype
              combi_method: missing
              count: 0
              ignore_non_variants: true

but then I get

./GLnexus/glnexus_cli --config config_custom.yml  *.g.vcf.gz|bcftools view|bgzip -@ 8 -c > joint_noAQfilter.vcf.gz
[17095] [2020-06-11 13:18:04.730] [GLnexus] [info] glnexus_cli release v1.2.6-0-g4d057dc Thu Mar 19 09:26:57 2020
[17095] [2020-06-11 13:18:04.730] [GLnexus] [warning] jemalloc absent, which will impede performance with high thread counts. See https://github.com/dnanexus-rnd/GLnexus/wiki/Performance
[17095] [2020-06-11 13:18:04.730] [GLnexus] [info] Loading config YAML file config_custom.yml
[17095] [2020-06-11 13:18:04.730] [GLnexus] [error] Failed to load unifier/genotyper configuration: IOError: loading configuration YAML file (config_custom.yml)
Failed to read from standard input: unknown file type
mlin commented 4 years ago

Yes to all. The AQ is only about ~5 on Phred scale. It looks to me like the indentation may just be off in your YAML -- unifier_config: and genotyper_config: should be at the same indentation level