Closed priyambial123 closed 2 years ago
Hi, this depends on the commands that you used for slivar expr
and (I assume) slivar tsv
. If you can share those, then I can help explain the columns.
I expect that gnomad_nhomalt
is the number across all populations.
-1 indicates that the variant is not present (to differentiate from 0, where the variant was present and then filtered) in gnomAD
for genotypes, yes, 1 and 2 indicate the number of alternate alleles.
The below rules was run in the snakemake workflow
slivar_filters = [
f"""--info 'variant.FILTER==\"PASS\" \
&& INFO.gnomad_af <= {config['max_gnomad_af']} \
&& INFO.hprc_af <= {config['max_hprc_af']} \
&& INFO.gnomad_nhomalt <= {config['max_gnomad_nhomalt']} \
&& INFO.hprc_nhomalt <= {config['max_hprc_nhomalt']}'""",
"--family-expr 'recessive:fam.every(segregating_recessive)'",
"--family-expr 'x_recessive:(variant.CHROM == \"chrX\") && fam.every(segregating_recessive_x)'",
f"""--family-expr 'dominant:fam.every(segregating_dominant) \
&& INFO.gnomad_ac <= {config['max_gnomad_ac']} \
&& INFO.hprc_ac <= {config['max_hprc_ac']}'""",
f"""--family-expr 'x_dominant:(variant.CHROM == \"chrX\") \
&& fam.every(segregating_dominant_x) \
&& INFO.gnomad_ac <= {config['max_gnomad_ac']} \
&& INFO.hprc_ac <= {config['max_hprc_ac']}'""",
]
if singleton:
# singleton
slivar_filters.append(f"--sample-expr 'comphet_side:sample.het && sample.GQ > {config['min_gq']}'")
else:
# trio cohort
slivar_filters.append("--trio 'comphet_side:comphet_side(kid, mom, dad) && kid.affected'")
rule slivar_small_variant:
input:
bcf = f"cohorts/{cohort}/slivar/{cohort}.{ref}.deepvariant.phased.norm.bcf",
csi = f"cohorts/{cohort}/slivar/{cohort}.{ref}.deepvariant.phased.norm.bcf.csi",
ped = f"cohorts/{cohort}/{cohort}.ped",
gnomad_af = {config['ref']['gnomad_gnotate']},
hprc_af = {config['ref']['hprc_dv_gnotate']},
js = config['slivar_js'],
gff = config['ref']['ensembl_gff'],
ref = config['ref']['fasta']
output: f"cohorts/{cohort}/slivar/{cohort}.{ref}.deepvariant.phased.slivar.vcf"
log: f"cohorts/{cohort}/logs/slivar/filter/{cohort}.{ref}.deepvariant.phased.slivar.vcf.log"
benchmark: f"cohorts/{cohort}/benchmarks/slivar/filter/{cohort}.{ref}.deepvariant.phased.slivar.tsv"
params: filters = slivar_filters
threads: 12
conda: "envs/slivar.yaml"
message: "Executing {rule}: Annotating {input.bcf} and applying filters."
shell:
"""
(pslivar --processes {threads} \
--fasta {input.ref}\
--pass-only \
--js {input.js} \
{params.filters} \
--gnotate {input.gnomad_af} \
--gnotate {input.hprc_af} \
--vcf {input.bcf} \
--ped {input.ped} \
| bcftools csq -l -s - --ncsq 40 \
-g {input.gff} -f {input.ref} - -o {output}) > {log} 2>&1
Above rule was taken from this link: https://github.com/PacificBiosciences/pb-human-wgs-workflow-snakemake/blob/main/rules/cohort_svpack.smk
Thanks
Yes, so what I have described above should hold. The values from gnomad are simply from pooling all samples and reporting, e.g. allele frequency or number of homalts.
Thank you. So, the value of 2 in gnomad_nhomalt means that number of alternate alleles is same in 2 different populations?
the value of 2 in gnomad_nhomalt means that 2 samples in gnomad were homozygous for the alternate allele at that site.
Thank you
Hello
Using the slivar, the small variants in the whole genome sequence (from PacBio) was annotated. I have some doubts on the some of the column labels in the annotated file.
In the genotype column, the number 1 and 2 indicates the number of alternate alleles? In the gnomad_ad, what does the number -1 mean, is it unknown allele frequency? From which populations was the allele frequency was determined?
In the gnomad_homalt and in gnomad_ac columns, what is -1?. I read the paper (doi.org/10.1038/s41525-021-00227-3) and I understand that Gnomad_homalt values are from GRCh38 assembly, does the value denote here?
Thanks Priya