Closed hfuchs5 closed 1 year ago
in summarize_by_gene()
on line 35 when you name the columns it is possible that they have the same name. After this you cannot use mutate or other dplyr functions. If you are trying to name the columns after the sample_id you can maybe do something like this: all_bin2[rownames(all_bin2)== "sample_id",]
@hfuchs5 I made some updates to simplify how to handle sample_id
(you're right that rownames can't be avoided unfortunately!) and add a few speed enhancements (e.g. using table()
instead of summarise
). I also added a check for numeric columns up front instead of transforming later which seems to speed things even more!
test replications elapsed relative user.self sys.self
1 summarize_by_gene_new 500 161.339 1.000 156.353 5.092
2 summarize_by_gene 500 229.061 1.420 222.734 6.407
Could you test it out with your large data set to make sure it still is faster?
Thanks for your work on this. I think we are close!
It ran through my dataset with 1000+ samples in 1.14 seconds! It looks good to me and I'm going to merge.
What changes are proposed in this pull request? Avoid use of pivot_*() functions in this because it was taking too long. I used base R matrices instead. Now it takes 10 seconds to run according to
tictoc
package.If there is an GitHub issue associated with this pull request, please provide link.
259
Reviewer Checklist (if item does not apply, mark is as complete)
_pkgdown.yml
pkgdown::build_site()
. Check the R console for errors, and review the rendered website.withr::with_envvar(new = c("NOT_CRAN" = "true"), covr::report())
. Begin in a fresh R session without any packages loaded.usethis::use_spell_check()
runs with no spelling errors in documentationWhen the branch is ready to be merged into master:
NEWS.md
with the changes from this pull request under the heading "# cbioportalR (development version)
". If there is an issue associated with the pull request, reference it in parentheses at the end update (seeNEWS.md
for examples).codemetar::write_codemeta()
usethis::use_spell_check()
again