broadinstitute / gnomad-browser

Explore gnomAD datasets on the web
https://gnomad.broadinstitute.org
MIT License
80 stars 45 forks source link

Browse by RefSeq gene/transcript #741

Open nawatts opened 3 years ago

nawatts commented 3 years ago

Currently, gnomAD can only be browsed by Ensembl genes / transcripts. It should also support RefSeq genes / transcripts.

This only applies to gnomAD v3.1+. This requires VEP annotations with RefSeq transcripts, which were not included in prior versions of gnomAD.

nawatts commented 3 years ago

Looks like some RefSeq annotations are missing in VEP 101, which is used for annotations in gnomAD v3.1. https://github.com/Ensembl/ensembl-vep/issues/847

In gnomAD v3, variants in BRCA1 have only Ensembl annotations.

ds = hl.read_table("gs://gcp-public-data--gnomad/release/3.1.1/ht/genomes/gnomad.genomes.v3.1.1.sites.ht")
ds = hl.filter_intervals(ds, [hl.parse_locus_interval("chr17:43044295-43125364", reference_genome="GRCh38")])
ds.aggregate(hl.agg.explode(hl.agg.collect_as_set, ds.vep.transcript_consequences.map(lambda csq: csq.gene_id)))
# frozenset({'ENSG00000012048', 'ENSG00000198496', 'ENSG00000240828'})

As opposed to variants in PCSK9, which have both Ensembl and RefSeq annotations.

ds = hl.read_table("gs://gcp-public-data--gnomad/release/3.1.1/ht/genomes/gnomad.genomes.v3.1.1.sites.ht")
ds = hl.filter_intervals(ds, [hl.parse_locus_interval("chr1:55039447-55064852", reference_genome="GRCh38")])
ds.aggregate(hl.agg.explode(hl.agg.collect_as_set, ds.vep.transcript_consequences.map(lambda csq: csq.gene_id)))
# frozenset({'23358', '255738', 'ENSG00000162402', 'ENSG00000169174'})

This may have to wait until we update to a different version of VEP.

nawatts commented 2 years ago

This can be worked on now that the correct RefSeq GTF file has been identified. We can choose to delay release until annotations are fixed or flag the affected genes in the browser.

mattsolo1 commented 8 months ago

Discussion from 2024-03-19 roadmapping meeting, two things were decided:

1) annotating transcripts with MANE Plus Clinical would be valuable. Basically this would be a second asterisk added showing this annotation.

image

2) getting all refseq transcripts would be nice but more difficult compared to 1)