Column labels in annotation files #131

Using the slivar, the small variants in the whole genome sequence (from PacBio) was annotated. I have some doubts on the some of the column labels in the annotated file.

In the genotype column, the number 1 and 2 indicates the number of alternate alleles? In the gnomad_ad, what does the number -1 mean, is it unknown allele frequency? From which populations was the allele frequency was determined?

In the gnomad_homalt and in gnomad_ac columns, what is -1?. I read the paper ( and I understand that Gnomad_homalt values are from GRCh38 assembly, does the value denote here?

Thanks Priya

Hi, this depends on the commands that you used for slivar expr and (I assume) slivar tsv. If you can share those, then I can help explain the columns. I expect that gnomad_nhomalt is the number across all populations.

-1 indicates that the variant is not present (to differentiate from 0, where the variant was present and then filtered) in gnomAD

for genotypes, yes, 1 and 2 indicate the number of alternate alleles.

The below rules was run in the snakemake workflow

slivar_filters = [
        f"""--info 'variant.FILTER==\"PASS\" \
                && INFO.gnomad_af <= {config['max_gnomad_af']} \
                && INFO.hprc_af <= {config['max_hprc_af']} \
                && INFO.gnomad_nhomalt <= {config['max_gnomad_nhomalt']} \
                && INFO.hprc_nhomalt <= {config['max_hprc_nhomalt']}'""",
        "--family-expr 'recessive:fam.every(segregating_recessive)'",
        "--family-expr 'x_recessive:(variant.CHROM == \"chrX\") && fam.every(segregating_recessive_x)'",
        f"""--family-expr 'dominant:fam.every(segregating_dominant) \
                       && INFO.gnomad_ac <= {config['max_gnomad_ac']} \
                       && INFO.hprc_ac <= {config['max_hprc_ac']}'""",
        f"""--family-expr 'x_dominant:(variant.CHROM == \"chrX\") \
                       && fam.every(segregating_dominant_x) \
                       && INFO.gnomad_ac <= {config['max_gnomad_ac']} \
                       && INFO.hprc_ac <= {config['max_hprc_ac']}'""",
if singleton:
    # singleton
    slivar_filters.append(f"--sample-expr 'comphet_side:sample.het && sample.GQ > {config['min_gq']}'")
    # trio cohort
    slivar_filters.append("--trio 'comphet_side:comphet_side(kid, mom, dad) && kid.affected'")
rule slivar_small_variant:
        bcf = f"cohorts/{cohort}/slivar/{cohort}.{ref}.deepvariant.phased.norm.bcf",
        csi = f"cohorts/{cohort}/slivar/{cohort}.{ref}.deepvariant.phased.norm.bcf.csi",
        ped = f"cohorts/{cohort}/{cohort}.ped",
        gnomad_af = {config['ref']['gnomad_gnotate']},
        hprc_af = {config['ref']['hprc_dv_gnotate']},
        js = config['slivar_js'],
        gff = config['ref']['ensembl_gff'],
        ref = config['ref']['fasta']
    output: f"cohorts/{cohort}/slivar/{cohort}.{ref}.deepvariant.phased.slivar.vcf"
    log: f"cohorts/{cohort}/logs/slivar/filter/{cohort}.{ref}.deepvariant.phased.slivar.vcf.log"
    benchmark: f"cohorts/{cohort}/benchmarks/slivar/filter/{cohort}.{ref}.deepvariant.phased.slivar.tsv"
    params: filters = slivar_filters
    threads: 12
    conda: "envs/slivar.yaml"
    message: "Executing {rule}: Annotating {input.bcf} and applying filters."
        (pslivar --processes {threads} \
            --fasta {input.ref}\
            --pass-only \
            --js {input.js} \
            {params.filters} \
            --gnotate {input.gnomad_af} \
            --gnotate {input.hprc_af} \
            --vcf {input.bcf} \
            --ped {input.ped} \
            | bcftools csq -l -s - --ncsq 40 \
                -g {input.gff} -f {input.ref} - -o {output}) > {log} 2>&1

Above rule was taken from this link:


Yes, so what I have described above should hold. The values from gnomad are simply from pooling all samples and reporting, e.g. allele frequency or number of homalts.

Thank you. So, the value of 2 in gnomad_nhomalt means that number of alternate alleles is same in 2 different populations?

the value of 2 in gnomad_nhomalt means that 2 samples in gnomad were homozygous for the alternate allele at that site.

Thank you