Closed karissawhiting closed 1 year ago
@jalavery if we wanted to add a GENIE alias table that can resolve any genes from the GENIE data, do you have a sense of what gene panels should be included? Would it be any panel used in any of the genie data sets?
I agree, I think we would want to look at any panel used in any of the GENIE BPC datasets (across all cancer types bc inclusion criteria years of sequencing varied across cancer types, so it's possible different panels were introduced over time).
GENIE BPC includes genomic + clinical data for patients from 4 institutions (MSK, Dana Farber, Vanderbilt, UHN) that are a subset of broader GENIE, which is only genomic data. Broader GENIE has data from many more institutions (19+ but don't quote me), some of which are joining phase 2 of the BPC project.
I think to prioritize we could start with all panels used by the current 4 institutions in BPC, then expand to the institutions joining in phase 2 (starting in Jan 2023), and if eventually needed, think about the broader GENIE panels that aren't a part of BPC. I am not sure how much the broader GENIE data are used by our dept, so I am not sure this last part is worthwhile at the moment.
What do you think? Let me know if you want to chat. Thank you!
Right now we have vetted aliases for IMPACT panels only and use
gnomeR::impact_alias_table
as our reference alias dictionary inrecode_alias()
. It contains the main IMPACT genes, but some aliases for non-IMPACT genes will not be caught.We have documented this limitation, but it would be nice to expand this functionality to non IMPACT genes (e.g. GENIE panels). There is an
alias_table
argument inrecode_alas()
that is currently usinggnomeR::impact_alias_table
. We can explore providing more comprehensive alias lists here (or connecting to another service/database that does this already).It may be worth looking into open source gene name databases to connect to
Details in PR #221