Open mckeowr1 opened 2 years ago
I looked at the VCF after it is annotated by BCSQ and there are intronic variant annotations. It's likely to do with the generation of the flatfile
The first place I thought we could be losing these is during the process bcsq_extract_scores
which runs a bcftools query to pull out information to make the tsv file. It looks like they are still present.
head BCSQ_scores.tsv
III 1331 A G synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G 6 NA 94.46
III 7934 G C intron|WBGene00019183||protein_coding NA NA NA
III 8031 T C intron|WBGene00019183||protein_coding NA NA NA
III 23328 A C intron|WBGene00019185||protein_coding NA NA NA
III 23344 G C intron|WBGene00019185||protein_coding NA NA NA
III 23345 T TC intron|WBGene00019185||protein_coding NA NA NA
III 23352 TA T intron|WBGene00019185||protein_coding NA NA NA
III 23356 C CCG intron|WBGene00019185||protein_coding NA NA NA
III 23358 AAG A intron|WBGene00019185||protein_coding NA NA NA
In the next process bcsq_extract_samples
it appears that we are losing the intronic annotation:
head BCSQ_samples.tsv
III 1331 A G JU2234:synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G=
III 7934 G C
III 8031 T C
III 23328 A C
III 23344 G C
III 23345 T TC
III 23352 TA T
III 23356 C CCG
III 23358 AAG A
III 23363 A G
The BCSQ_score_parsed.tsv
that is also generated from the same VCF has these intronic variant annotations.
CHROM POS REF ALT ANNOTATION BLOSUM Grantham Percent_Protein
III 1331 A G synonymous|WBGene00008352|cTel54X.1.1|protein_coding|-|324G|1331A>G 6 NA 94.46
III 7934 G C intron|WBGene00019183||protein_coding NA NA NA
III 8031 T C intron|WBGene00019183||protein_coding NA NA NA
III 23328 A C intron|WBGene00019185||protein_coding NA NA NA
III 23344 G C intron|WBGene00019185||protein_coding NA NA NA
Variant annotations for consequence == 'intron_variant' are missing in the most recent releases. The variants are still present but are not annotated with a gene or consequence.