Open nawatts opened 3 years ago
Looks like some RefSeq annotations are missing in VEP 101, which is used for annotations in gnomAD v3.1. https://github.com/Ensembl/ensembl-vep/issues/847
In gnomAD v3, variants in BRCA1 have only Ensembl annotations.
ds = hl.read_table("gs://gcp-public-data--gnomad/release/3.1.1/ht/genomes/gnomad.genomes.v3.1.1.sites.ht")
ds = hl.filter_intervals(ds, [hl.parse_locus_interval("chr17:43044295-43125364", reference_genome="GRCh38")])
ds.aggregate(hl.agg.explode(hl.agg.collect_as_set, ds.vep.transcript_consequences.map(lambda csq: csq.gene_id)))
# frozenset({'ENSG00000012048', 'ENSG00000198496', 'ENSG00000240828'})
As opposed to variants in PCSK9, which have both Ensembl and RefSeq annotations.
ds = hl.read_table("gs://gcp-public-data--gnomad/release/3.1.1/ht/genomes/gnomad.genomes.v3.1.1.sites.ht")
ds = hl.filter_intervals(ds, [hl.parse_locus_interval("chr1:55039447-55064852", reference_genome="GRCh38")])
ds.aggregate(hl.agg.explode(hl.agg.collect_as_set, ds.vep.transcript_consequences.map(lambda csq: csq.gene_id)))
# frozenset({'23358', '255738', 'ENSG00000162402', 'ENSG00000169174'})
This may have to wait until we update to a different version of VEP.
This can be worked on now that the correct RefSeq GTF file has been identified. We can choose to delay release until annotations are fixed or flag the affected genes in the browser.
Discussion from 2024-03-19 roadmapping meeting, two things were decided:
1) annotating transcripts with MANE Plus Clinical
would be valuable. Basically this would be a second asterisk added showing this annotation.
2) getting all refseq transcripts would be nice but more difficult compared to 1)
Currently, gnomAD can only be browsed by Ensembl genes / transcripts. It should also support RefSeq genes / transcripts.
This only applies to gnomAD v3.1+. This requires VEP annotations with RefSeq transcripts, which were not included in prior versions of gnomAD.