Closed karissawhiting closed 1 year ago
Thank you so much for this! I re-tested and now get consistent results when running create_gene_binary()
based on CNA data pulled from the portal versus CNA data that was pivoted with pivot_cna_longer()
🎉
The only discrepancy is that there is a difference of 2 variables returned when I run it with the CNA data from the portal vs genie CNA data, but the columns are totally blank so not sure how much this matters? The column ADGRA2.Amp
is only returned when create_gene_binary()
is run based on the portal data, and the column GPR124.Amp
is only returned when create_gene_binary()
is run based on the transposed GENIE CNA data. Any ideas why that might be? Neither ADGRA2 or GPR124 is on the underlying CNA files.
Super minor: I also tested the messaging when putting in the GENIE CNA data before it's pivoted. The error returned is
Error in sanitize_cna_input() at gnomeR/R/create-gene-binary.R:132:4: ! The following required columns are missing in your mutations data: sample_id and alteration. Is your data in wide format? If so, it must be long format. See gnomeR::pivot_cna_long() to reformat
Rather than cols missing in mutations data should it say CNA data?
Thanks @jalavery!
It looks like these genes are aliases of each other
library(cbioportalR)
library(gnomeR)
#> Registered S3 method overwritten by 'GGally':
#> method from
#> +.gg ggplot2
#>
#> Attaching package: 'gnomeR'
#> The following object is masked from 'package:cbioportalR':
#>
#> impact_gene_info
set_cbioportal_db("public")
#> ✔ You are successfully connected!
#> ✔ base_url for this R session is now set to "www.cbioportal.org/api"
get_alias("ADGRA2")
#> # A tibble: 6 × 2
#> hugo_symbol alias
#> <chr> <chr>
#> 1 ADGRA2 DKFZp434C211
#> 2 ADGRA2 DKFZp434J0911
#> 3 ADGRA2 FLJ14390
#> 4 ADGRA2 GPR124
#> 5 ADGRA2 KIAA1531
#> 6 ADGRA2 TEM5
But neither are in the IMPACT panels:
which_impact_panel("ADGRA2")
#> # A tibble: 1 × 5
#> genes_in_panel IMPACT341 IMPACT410 IMPACT468 IMPACT505
#> <chr> <chr> <chr> <chr> <chr>
#> 1 ADGRA2 no no no no
which_impact_panel("GPR124")
#> # A tibble: 1 × 5
#> genes_in_panel IMPACT341 IMPACT410 IMPACT468 IMPACT505
#> <chr> <chr> <chr> <chr> <chr>
#> 1 GPR124 no no no no
vetted_alias <- gnomeR::impact_alias_table %>%
tidyr::unnest(everything())
vetted_alias %>% dplyr::filter(hugo_symbol %in% c("ADGRA2", "GPR124"))
#> # A tibble: 0 × 4
#> # … with 4 variables: hugo_symbol <chr>, alias <chr>, entrez_id <int>,
#> # alias_entrez_id <int>
vetted_alias %>% dplyr::filter(alias %in% c("ADGRA2", "GPR124"))
#> # A tibble: 0 × 4
#> # … with 4 variables: hugo_symbol <chr>, alias <chr>, entrez_id <int>,
#> # alias_entrez_id <int>
They are actually different within the raw data itself:
# It is ADGRA2 in cBioPortal data
x <- all_cna_from_portal %>%
filter(hugoGeneSymbol %in% c("GPR124", "ADGRA2")) %>%
select(hugoGeneSymbol, sampleId)
x
#> # hugoGeneSymbol sampleId
#> # <chr> <chr>
#> # 1 ADGRA2 GENIE-VICC-199259-unk-1
# It is GPR124 in GENIE data
y <- nsclc_public_cna %>%
filter(Hugo_Symbol %in% c("GPR124", "ADGRA2"))
y %>% select(Hugo_Symbol, which(map(y, ~sum(as.numeric(.x), na.rm = TRUE)) > 0))
#> # Hugo_Symbol GENIE.VICC.199259.unk.1
#> # 1 GPR124 2
This is where alias resolution gets really important, but unfortunately I don't think it's feasible to support every panel for vetted aliases. Right now we have vetted aliases for all IMPACT panels only and use gnomeR::impact_alias_table
as our reference alias dictionary. It contains the main IMPACT genes, but some aliases for non-IMPACT genes will not be caught.
Here are my suggestions for moving forward:
1) Make sure the current limitations of the alias functionality is explicitly documented (maybe add something to the alias message in console as well saying IMPACT genes were checked? I think it says "common" right now.)
2) I remembered that I actually originally wrote the recode_alias
and resolve_alias
functions with future expansion in mind. There is an alias_table
argument that is currently using gnomeR::impact_alias_table
but in the future we can explore providing more comprehensive alias lists (or connecting to another service/database that does this already? - turn into an issue?
Let me know if you have any questions or thoughts.
Thanks!
Thank you for looking into this! I realized I was searching for the genes on the CNA files incorrectly, which is why I thought that they weren't on there, sorry about that. What you have above makes sense to me.
The current messaging about recoding is: To ensure gene with multiple names/aliases are correctly grouped together, the following genes in your dataframe have been recoded (you can prevent this with recode_aliases = FALSE):
. What about adding a sentence to the documentation under recode_alias? Something like: "Currently, alias recoding is only available for genes that are on MSK IMPACT panels"?
I think opening an issue for this could be good, though it's comforting that only 1 gene was affected by this, so it doesn't feel like such a big issue if there isn't an easy way to support non-IMPACT genes.
Let me know if it's helpful to chat about this!
Thanks @jalavery! I have updated the docs with your suggestion and opened an issue #225 to address expanding this functionality to non IMPACT genes. Feel free to edit/add details there if you have any suggestions or would like to work on it.
What changes are proposed in this pull request? This PR fixes a bug in CNA processing in
pivot_cna_longer()
.Additionally, I changed the way CNA is coded internally and added an argument
high_level_cna_only
to allow users to only annotate high level dels/amps. By default it will count any type of alt as an event.Another change is that
pivot_cna_longer()
now only returns events and allneutral
events are filtered out.Reviewer Checklist (if item does not apply, mark is as complete)
_pkgdown.yml
pkgdown::build_site()
. Check the R console for errors, and review the rendered website.withr::with_envvar(new = c("NOT_CRAN" = "true"), covr::report())
. Begin in a fresh R session without any packages loaded.usethis::use_spell_check()
runs with no spelling errors in documentationWhen the branch is ready to be merged into master:
NEWS.md
with the changes from this pull request under the heading "# cbioportalR (development version)
". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.md
for examples).codemetar::write_codemeta()
usethis::use_spell_check()
again