AndersenLab / annotation-nf

Annotate VCF with snpeff and bcsq
1 stars 1 forks source link

Variant Annotations are missing from the 20220216 releases #2

Open mckeowr1 opened 2 years ago

mckeowr1 commented 2 years ago

Variant annotations for consequence == 'intron_variant' are missing in the most recent releases. The variants are still present but are not annotated with a gene or consequence.

mckeowr1 commented 2 years ago

I looked at the VCF after it is annotated by BCSQ and there are intronic variant annotations. It's likely to do with the generation of the flatfile

mckeowr1 commented 2 years ago

The first place I thought we could be losing these is during the process bcsq_extract_scores which runs a bcftools query to pull out information to make the tsv file. It looks like they are still present.

head BCSQ_scores.tsv

III     1331    A       G       synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G     6       NA      94.46
III     7934    G       C       intron|WBGene00019183||protein_coding   NA      NA      NA
III     8031    T       C       intron|WBGene00019183||protein_coding   NA      NA      NA
III     23328   A       C       intron|WBGene00019185||protein_coding   NA      NA      NA
III     23344   G       C       intron|WBGene00019185||protein_coding   NA      NA      NA
III     23345   T       TC      intron|WBGene00019185||protein_coding   NA      NA      NA
III     23352   TA      T       intron|WBGene00019185||protein_coding   NA      NA      NA
III     23356   C       CCG     intron|WBGene00019185||protein_coding   NA      NA      NA
III     23358   AAG     A       intron|WBGene00019185||protein_coding   NA      NA      NA
mckeowr1 commented 2 years ago

In the next process bcsq_extract_samples it appears that we are losing the intronic annotation: head BCSQ_samples.tsv

III     1331    A       G       JU2234:synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G=
III     7934    G       C
III     8031    T       C
III     23328   A       C
III     23344   G       C
III     23345   T       TC
III     23352   TA      T
III     23356   C       CCG
III     23358   AAG     A
III     23363   A       G

The BCSQ_score_parsed.tsv that is also generated from the same VCF has these intronic variant annotations.

CHROM   POS REF ALT ANNOTATION  BLOSUM  Grantham    Percent_Protein
III 1331    A   G   synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G 6   NA  94.46
III 7934    G   C   intron|WBGene00019183||protein_coding   NA  NA  NA
III 8031    T   C   intron|WBGene00019183||protein_coding   NA  NA  NA
III 23328   A   C   intron|WBGene00019185||protein_coding   NA  NA  NA
III 23344   G   C   intron|WBGene00019185||protein_coding   NA  NA  NA