FINNGEN / autoreporting

MIT License
0 stars 1 forks source link

add functional variant gene in addition to the consequence #183

Closed Fedja closed 3 years ago

Fedja commented 3 years ago

now we have chr6_30163934_G_C|missense_variant|0.0747; we want chr6_30163934_G_C|missense_variant|TRIM15|0.0747;

Lipastomies commented 3 years ago

That's actually a bit tricky due to how the annotations are used. The reason being that the functional variant annotation (column functional_category) is taken from gs://r4_data_west1/gnomad_functional_variants/fin_enriched_genomes_select_columns.txt.gz, which has the following columns:

grch38_locus
alleles
locus
chrom
pos
ref
alt
rsid
consequence
fin.AC
fin.AF
fin.AN
fin.homozygote_count
nfsee.AC
nfsee.AN
nfsee.homozygote_count
nfsee.AF
enrichment
enrichment_ds
enrichment_nfsee
enrichment_pseudo_nfsee
fet.p_value
fet.odds_ratio
fet_ds.p_value
fet_ds.odds_ratio
fet_nfsee.p_value
fet_nfsee.odds_ratio

Then, in finngen annotation, we get the most severe gene and consequence from gs://r3_data/annotations/R3_vep_annot.tsv.gz. So we don't strictly speaking have the associated gene in autoreporting (unless it's taken from the same R3 vep annotation file?).

I'm not completely sure on where the functional variants annotation gets its consequence-column. I think it would make sense to look at the different annotations we use in finngen and see if they all are up to date.

Fedja commented 3 years ago

agh thats bad... we should be getting our functional annotation from each finngen release annotation file and not from gnomad as we will be missing variants that cant be lifted over to 38 (and maybe also in that file where strand was changed).

Everything else except enrichment in autoreporting should be based on FinnGen annotations.

and these should be coding variants:

transcript_ablation splice_donor_variant stop_gained splice_acceptor_variant frameshift_variant stop_lost start_lost inframe_insertion inframe_deletion missense_variant protein_altering_variant

juhis commented 3 years ago

Btw we’re still using gnomad 2 (build 37) as gnomad 3 (build 38) doesn’t have European subpops so we can’t do NF(S)EE enrichment with gnomad 3 as you know. They’re planning to release gnomad 4 with a big increase for exomes in fall so we probably should switch to that then and go with the current setup for now

On May 25, 2021, at 4:39 AM, Mitja Kurki @.***> wrote:

agh thats bad... we should be getting our functional annotation from each finngen release annotation file and not from gnomad as we will be missing variants that cant be lifted over to 38 (and maybe also in that file where strand was changed).

Everything else except enrichment in autoreporting should be based on FinnGen annotations.

and these should be coding variants:

transcript_ablation splice_donor_variant stop_gained splice_acceptor_variant frameshift_variant stop_lost start_lost inframe_insertion inframe_deletion missense_variant protein_altering_variant

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/FINNGEN/autoreporting/issues/183#issuecomment-847466599, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRO6NVHEEBAAM4Q5KFU5LTPL5W5ANCNFSM45JP6SUQ.

Fedja commented 3 years ago

Btw we’re still using gnomad 2 (build 37) as gnomad 3 (build 38) doesn’t have European subpops so we can’t do NF(S)EE enrichment with gnomad 3 as you know. They’re planning to release gnomad 4 with a big increase for exomes in fall so we probably should switch to that then and go with the current setup for now On May 25, 2021, at 4:39 AM, Mitja Kurki @.***> wrote: agh thats bad... we should be getting our functional annotation from each finngen release annotation file and not from gnomad as we will be missing variants that cant be lifted over to 38 (and maybe also in that file where strand was changed). Everything else except enrichment in autoreporting should be based on FinnGen annotations. and these should be coding variants: transcript_ablation splice_donor_variant stop_gained splice_acceptor_variant frameshift_variant stop_lost start_lost inframe_insertion inframe_deletion missense_variant protein_altering_variant — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#183 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRO6NVHEEBAAM4Q5KFU5LTPL5W5ANCNFSM45JP6SUQ.

yea for enrichment we can use the current annotation but for picking functional variants we should use our own 38 based annotation. it's not a big difference but we are missing functional variants due to that.

Lipastomies commented 3 years ago

Done