jalavery commented 1 month ago

What changes are proposed in this pull request?

Copied function & tests from summarize_by_gene() and modified accordingly
If pt has alteration on any sample, it counts as an alteration. If gene included on the panel for each sample and no alteration, then no alteration. If gene missing from any panels for a sequenced sample for a patient, then overall alteration status is unknown.
Added a test to check this hierarchy
Also added a test to make sure results are the same as summarize_by_gene() when there is only 1 sample/patient
Note that the code max(c(.x, 0), na.rm = TRUE) on line 126 of summarize_by_patient() code is to avoid a warning that appears for genes not on any sample for a patient because for max(c(NA, NA), na.rm = TRUE) the warning no non-missing arguments to max; returning -Inf is returned

Let me know if there's anything you want me to add or modify once you have a chance to take a look!

If there is an GitHub issue associated with this pull request, please provide link.

345

Reviewer Checklist (if item does not apply, mark is as complete)

[ ] PR branch has pulled the most recent updates from main branch. Ensure the pull request branch and your local version match and both have the latest updates from the main branch.
[ ] If a new function was added, function included in _pkgdown.yml
[ ] If a bug was fixed, a unit test was added for the bug check
[ ] Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
[ ] Code coverage is suitable for any new functions/features. Review coverage with withr::with_envvar(new = c("NOT_CRAN" = "true"), covr::report()). Begin in a fresh R session without any packages loaded.
[ ] R CMD Check runs without errors, warnings, and notes
[ ] usethis::use_spell_check() runs with no spelling errors in documentation

When the branch is ready to be merged into master:

[ ] Update NEWS.md with the changes from this pull request under the heading "# cbioportalR (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
- [ ] Run codemetar::write_codemeta()
- [ ] Run usethis::use_spell_check() again
- [ ] Approve Pull Request
- [ ] Merge the PR

karissawhiting commented 2 weeks ago

Thank you @jalavery this looks fabulous!!! Code looks perfect, though through the processing of reviewing I thought about two potential amendments to functionality that I wanted to run by you:

1) I took out the extract_patient_id() part within the function and instead require users to input a data frame that already has a patient ID column. This will allow it to be more flexible to non IMPACT samples. If the input data frame doesn't have a patient ID column already, I added this suggestion in the error message:

To extract patient IDs from IMPACT sample IDs (e.g. P-XXXXXX-TXX-IMX), use gnomeR::extract_patient_id(data$sample_id)

The one annoying side effect is that in other functions, if you have patient_id in your data, you have to explicitly then pass it to other_vars argument or you get an error when it's present in the input data:

Error in `.abort_if_not_numeric()` at gnomeR/R/summarize-by-gene.R:61:3:
! All alterations in your gene binary must be numeric and only can have values of 0, 1, or NA. Please
  coerce the following columns to numeric or pass them to the `other_vars` argument before proceeding:
  patient_id

We could consider adding it as ignored automatically in other functions if it's present? Idk...

2) At the end of the function we join back the other_vars from the input data frame, however, other variables may be on sample level and this will cause the resulting data frame to not be one unique observation per patient. Do we think this will be confusing? Maybe we could just add a warning with number of unique patients and number of resulting df rows if they differ?

jalavery commented 2 weeks ago

Thank you for taking a look! The updates all sound good to me - For #1: I like the modification to make it more general than just handling IMPACT samples. As a user I'd be okay having to specify the patient id in the other_vars argument. For #2: Great call. I think a warning would be informative. I prob won't have a chance to tinker with this today/tomorrow before going OOO next week, but can take a look once I'm back!

MSKCC-Epi-Bio / gnomeR

Add summarize by patient #346

345