MSKCC-Epi-Bio / gnomeR

Package to wrangle and visualize genomic data in R
https://mskcc-epi-bio.github.io/gnomeR/
Other
26 stars 19 forks source link

Allow users to pass alternate alias reference tables to recode_alias() #225

Closed karissawhiting closed 1 year ago

karissawhiting commented 1 year ago

Right now we have vetted aliases for IMPACT panels only and use gnomeR::impact_alias_table as our reference alias dictionary in recode_alias(). It contains the main IMPACT genes, but some aliases for non-IMPACT genes will not be caught.

We have documented this limitation, but it would be nice to expand this functionality to non IMPACT genes (e.g. GENIE panels). There is an alias_table argument in recode_alas() that is currently using gnomeR::impact_alias_table. We can explore providing more comprehensive alias lists here (or connecting to another service/database that does this already).

It may be worth looking into open source gene name databases to connect to

Details in PR #221

karissawhiting commented 1 year ago

@jalavery if we wanted to add a GENIE alias table that can resolve any genes from the GENIE data, do you have a sense of what gene panels should be included? Would it be any panel used in any of the genie data sets?

jalavery commented 1 year ago

I agree, I think we would want to look at any panel used in any of the GENIE BPC datasets (across all cancer types bc inclusion criteria years of sequencing varied across cancer types, so it's possible different panels were introduced over time).

GENIE BPC includes genomic + clinical data for patients from 4 institutions (MSK, Dana Farber, Vanderbilt, UHN) that are a subset of broader GENIE, which is only genomic data. Broader GENIE has data from many more institutions (19+ but don't quote me), some of which are joining phase 2 of the BPC project.

I think to prioritize we could start with all panels used by the current 4 institutions in BPC, then expand to the institutions joining in phase 2 (starting in Jan 2023), and if eventually needed, think about the broader GENIE panels that aren't a part of BPC. I am not sure how much the broader GENIE data are used by our dept, so I am not sure this last part is worthwhile at the moment.

What do you think? Let me know if you want to chat. Thank you!