Closed Syksy closed 11 months ago
Will try and take a look today. Thanks for the thorough update. acs
In analysis.Rmd, lines 126 and 135. Need to figure out the MatchedAssayExperiment
class but appears that's where they are talking.
edit1: intersect
results in 461 TCGA samples, MatchedAssayExperiment
produces 400. Easy to adjust but lose samples.
@varsha090597 If needed, can help with this as well. Mentioned helping rewrite the big nested if/for. Let me know.
Thanks @ACSoupir for your help, highly appreciated; any chance you could work on this soon @varsha090597 as Alex is also willing to help out?
In analysis.Rmd, lines 126 and 135. Need to figure out the
MatchedAssayExperiment
class but appears that's where they are talking.edit1:
intersect
results in 461 TCGA samples,MatchedAssayExperiment
produces 400. Easy to adjust but lose samples.
What are the sample types in these 461 and 400 samples? We'd like only primary samples in this case; is it possible that the 61 are normals and/or metastatic or other non-primary samples?
Will have to see (lot on my plate right now).
From reading documentation it seems like the suggested function/class keeps samples that have data in all slots. It's more restrictive than the original intersect
method, but don't know if intersect removed different samples that are now kept (i.e. intersect only kept primary, but there are primary and mets with complete data).
Dear all, with all the check-boxes ticked with what my in opinion is a good revision, I'll be submitting this back to Bioconductor review.
The latest R CMD check is:
- using log directory 'D:/Gits/curatedPCaData.Rcheck'
- using R Under development (unstable) (2023-07-23 r84741 ucrt)
- using platform: x86_64-w64-mingw32
- R was compiled by gcc.exe (GCC) 12.2.0 GNU Fortran (GCC) 12.2.0
- running under: Windows 10 x64 (build 19045)
- using session charset: UTF-8
- checking for file 'curatedPCaData/DESCRIPTION' ... OK
- this is package 'curatedPCaData' version '0.99.2'
- package encoding: UTF-8
- checking package namespace information ... OK
- checking package dependencies ... OK
- checking if this is a source package ... OK
- checking if there is a namespace ... OK
- checking for executable files ... OK
- checking for hidden files and directories ... OK
- checking for portable file names ... OK
- checking whether package 'curatedPCaData' can be installed ... OK
- checking installed package size ... OK
- checking package directory ... OK
- checking 'build' directory ... OK
- checking DESCRIPTION meta-information ... OK
- checking top-level files ... OK
- checking for left-over files ... OK
- checking index information ... OK
- checking package subdirectories ... OK
- checking R files for non-ASCII characters ... OK
- checking R files for syntax errors ... OK
- checking whether the package can be loaded ... OK
- checking whether the package can be loaded with stated dependencies ... OK
- checking whether the package can be unloaded cleanly ... OK
- checking whether the namespace can be loaded with stated dependencies ... OK
- checking whether the namespace can be unloaded cleanly ... OK
- checking loading without being on the library search path ... OK
- checking dependencies in R code ... OK
- checking S3 generic/method consistency ... OK
- checking replacement functions ... OK
- checking foreign function calls ... OK
- checking R code for possible problems ... OK
- checking Rd files ... OK
- checking Rd metadata ... OK
- checking Rd cross-references ... OK
- checking for missing documentation entries ... OK
- checking for code/documentation mismatches ... OK
- checking Rd \usage sections ... OK
- checking Rd contents ... OK
- checking for unstated dependencies in examples ... OK
- checking contents of 'data' directory ... OK
- checking data for non-ASCII characters ... OK
- checking data for ASCII and uncompressed saves ... OK
- checking R/sysdata.rda ... OK
- checking installed files from 'inst/doc' ... OK
- checking files in 'vignettes' ... OK
- checking examples ... OK
- checking for unstated dependencies in 'tests' ... OK
- checking tests ... Running 'native_tests.R' OK
- checking for unstated dependencies in vignettes ... OK
- checking package vignettes in 'inst/doc' ... OK
- checking running R code from vignettes ... 'overview.Rmd' using 'UTF-8'... OK NONE
- checking re-building of vignette outputs ... OK
- checking PDF version of manual ... OK
- DONE
Status: OK
The latest BiocCheck() is:
─ BiocCheckVersion: 1.37.7 ─ BiocVersion: 3.18 ─ Package: curatedPCaData ─ PackageVersion: 0.99.2 ─ sourceDir: D:/Gits/curatedPCaData ─ installDir: C:\Users\teemu\AppData\Local\Temp\Rtmp4OxmKI\file832452f04e5f ─ BiocCheckDir: D:/Gits/curatedPCaData.BiocCheck ─ platform: windows ─ isTarBall: FALSE
- Installing package...
- Checking for deprecated package usage...
- Checking for remote package usage...
- Checking for 'LazyData: true' usage...
- Checking version number...
- Checking version number validity... Package version 0.99.2; pre-release
- Checking R version dependency...
- Checking package size... Skipped... only checked on source tarball
- Checking individual file sizes...
- Checking biocViews...
- Checking that biocViews are present...
- Checking package type based on biocViews... ExperimentData
- Checking for non-trivial biocViews...
- Checking that biocViews come from the same category...
- Checking biocViews validity...
- Checking for recommended biocViews...
- Checking build system compatibility...
- Checking for blank lines in DESCRIPTION...
- Checking if DESCRIPTION is well formatted...
- Checking for proper Description: field...
- Checking for whitespace in DESCRIPTION field names...
- Checking that Package field matches directory/tarball name...
- Checking for Version field...
- Checking for valid maintainer...
- Checking License: for restrictive use...
- Checking for pinned package versions...
- Checking DESCRIPTION/NAMESPACE consistency...
- Checking .Rbuildignore...
- Checking for stray BiocCheck output folders...
- Checking for inst/doc folders...
- Checking vignette directory...
- Checking package installation calls in R code...
- Checking for library/require of curatedPCaData...
- Checking coding practice...
- Checking parsed R code in R directory, examples, vignettes...
- Checking function lengths...
- NOTE: The recommended function length is 50 lines or less. There are 3 functions greater than 50 lines.
- Checking man page documentation...
- Checking package NEWS...
- Checking unit tests...
- Checking skip_on_bioc() in tests...
- Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and vignette source...
- NOTE: Consider shorter lines; 12 lines (0%) are > 80 characters long.
- NOTE: Consider multiples of 4 spaces for line indents; 11 lines (0%) are not. See https://contributions.bioconductor.org/r-code.html See styler package: https://cran.r-project.org/package=styler as described in the BiocCheck vignette.
- Checking if package already exists in CRAN...
- Checking for bioc-devel mailing list subscription...
- NOTE: Cannot determine whether maintainer is subscribed to the Bioc-Devel mailing list (requires admin credentials). Subscribe here: https://stat.ethz.ch/mailman/listinfo/bioc-devel
- Checking for support site registration... Maintainer is registered at support site. Package name is in support site watched tags.
─ BiocCheck results ── 0 ERRORS | 0 WARNINGS | 4 NOTES
The 4 NOTEs (3 really since one is just due to not having admin access) in my opinion ought to be tolerated; the shorter lines are primarily URLs that I was not able to cut shorter while maintaining integrity. The space indents are automatically generated for authors by roxygen, and uses the 2-spacing type. The long functions in this context could not be split into parts reasonably.
Thus, I will be submitting version 0.99.2.
The Bioconductor review process is now complete (accepted), and we'll continue working on this package as well as focus on the curatedPCaWorkflow. See: https://github.com/Bioconductor/Contributions/issues/3047#issuecomment-1795760676 Thanks all! Daniel
Dear all, we've received Bioconductor review requests at: https://github.com/Bioconductor/Contributions/issues/3047#issuecomment-1591516244
Breaking it down (and to be updated in respect to what's been addressed):
curatedPCaData https://github.com/Bioconductor/Contributions/issues/3047
OK; As discussed in the submission thread for BioConductor at https://github.com/Bioconductor/Contributions/issues/3047#issuecomment-1625486043 , these downstream vignettes are to be moved to a new
workflow
package calledcuratedPCaWorkflow
at: https://github.com/Syksy/curatedPCaWorkflow . Work will continue there-in regarding the proposed improvements. Please see answers below for specific actions taken toward this package-wide adjustment.Multiple steps have been taken to address this and are further elaborated in answers below; briefly:
overview
vignette has been moved to/R/getpcasummaries.R
, and these functions are generalizable to any requested variable, are not dataset-specific, and contain e.g. standard documentation and unit testing./R/wrappers.R
.metadata.csv
(for example functiongetPCaStudies
) as well as the clinical metadata accessible viacolData
(for example functionsgetPCaSummaryTable
,getPCaSummarySurv
,getPCaSummarySamples
, andgetPCaSummaryStudies
).DESCRIPTION
OK
OK; Addressed in https://github.com/Syksy/curatedPCaData/commit/7ba59f5fb1bd917a0a0fc48e5245aad34cfec51b ; in addition, the package dependencies have been made more lean as the functions at
/R/wrappers.R
are no longer present, removing theimport
need for e.g.dplyr
andstringr
.OK; CC-BY 4.0 should still be fine based on discussion at https://github.com/Bioconductor/Contributions/issues/3047#issuecomment-1625486043
NAMESPACE
OK; This has now been adjusted to using just native R tests in: https://github.com/Syksy/curatedPCaData/commit/4c1a481a82060d371da958c23980853ac0080ed5 and the tests themselves were subsequently extended to cover the new functions at https://github.com/Syksy/curatedPCaData/commit/32352235fd9f0df8c04fd5d677ae2fb53c2b984c and https://github.com/Syksy/curatedPCaData/commit/ed9d0b8b2b0fd1ab2c8fd7647224fa05a062caa8 .
vignettes
OK; this formatting issue has been addressed in https://github.com/Syksy/curatedPCaData/commit/bb76b0121fdd8fcd7e949c583160eb36d1c04db4 . Furthermore, the rest of the code has also been double-checked for formatting (multiples of 4 spacebar indents, no tabs, char width 80), of which vast majority are fixed in e.g. https://github.com/Syksy/curatedPCaData/commit/019a304e9dffaf0b92046a308486dd80b6b5eec8 . The 12 lines (~0% of all code) longer than 80 chars and 11 lines (~0% of all code) that violate 4*spacebar multiples are special cases such as long URLs or automatically generated package Rd TeX-like code, respectively.
OK; Notable effort has been put into revising the
overview
vignette so it exemplifies this effectively with generalizable code, while rest of the vignettes that had an analysis focus have been now shifted to the separatecuratedPCaWorkflow
-package. See comments below.OK; Done in https://github.com/Syksy/curatedPCaData/commit/ee1c94bd8e09d9e6d072e5c5f27dd1655bcbc872 by moving to use
curatedPCaWorkflow
instead.OK; This pointer has been addressed in the PR by @ACSoupir at https://github.com/Syksy/curatedPCaData/pull/45 . However, as the
analyses
vignette has now been incorporated into the workflow package instead, this fix was not merged intocuratedPCaData
. This fix will instead be incorporated into thecuratedPCaWorkflow
.OK; After the other fixes both into the
overview
vignette as well as the generalized functions provided now in/R/getpcasummaries.R
, all calls of typemae_obj[,"variable"]
are now instead formatted asmae_obj$variable
.OK, at least partially; this revision request was not implemented as suggested here per se. I explored both the
TCGAutils
(and thecuratedTCGAdata
) for this purpose. However, I ended up with the conclusion that this would've been best implemented as part of the data curation step incuratedPCaData
, rather than implemented here when we're just offering the "window" to the data. In all of our datasets, thesample_type
field incolData
was extracted from the original source and represents the sample types (primary, normal, metastatic, ...). In this respect, I feel it would be a bit much to bring in a dependency just for TCGA, when this information is already available in thecolData
. However, to address the original purpose of subsetting to certain sample types, I have now modified the maingetPCa
function to allow subsetting the creation of the MAE object viaMultiAssayExperiment::subsetByColData
which will utilize thesample_type
available in our data : https://github.com/Syksy/curatedPCaData/commit/ba50efa4e16d266593589403709ae6a4ffa9f9cf . I hope this adjustment fulfills the requested improvement to the package - if not, please let me know and I will revise accordingly to the best of my ability.OK; The oncoprint related functions and wrappers have been now moved to the new
curatedPCaWorkflow
package (e.g. commit https://github.com/Syksy/curatedPCaData/commit/62be8a42449421dac6888d1d3c787242be8ea192 ). In hindsight, they are somewhat downstream analyses already, and therefore are better suited for this workflow-package. Thus, the oncoprint and their related functions are no longer part of thecuratedPCaData
-package.OK; This is a fair point, and indeed should've been the design already in the start. Notable effort has now been done to move the functionality previously coded inside the
overview
vignette, and made exported by the package itself accompanied by suitable examples, testing, and documentation. To enumerate these functions and key commits, of which majority resides now in the file located atR/getpcasummaries.R
:getPCaSummaryTable
: https://github.com/Syksy/curatedPCaData/commit/6d1aaaae356f11c1cdc73715aba9540e353ee2ccgetPCaSummarySurv
: https://github.com/Syksy/curatedPCaData/commit/6d1aaaae356f11c1cdc73715aba9540e353ee2ccgetPCaSummarySamples
: https://github.com/Syksy/curatedPCaData/commit/e4129376e1b6280bf14483baf908e51852bdfc43getPCaSummaryStudies
: https://github.com/Syksy/curatedPCaData/commit/b6bffd1ef4b3e669583576e9acc256d5e9934d84 and https://github.com/Syksy/curatedPCaData/commit/f8ef61ce7aec3e6da965afdbe65b5a5b46b19beagetPCaStudies
: https://github.com/Syksy/curatedPCaData/commit/71c73c6cc4b44401c730d9e7f268f8a9942fcf0cWith the above functions now in use by the vignette and exported by the user, the revision(s) addresses the above mentioned issues and clean up the vignette (Rmd commits mainly in https://github.com/Syksy/curatedPCaData/commit/2f1b02a34948400e99cf18c88b5053b3b6b81981 , https://github.com/Syksy/curatedPCaData/commit/90a37f3ba3609a4870da170c8fc375bcaf475a16 , https://github.com/Syksy/curatedPCaData/commit/7abab8cab289c3f31e587341874ae6121d4a248b )
OK; The citations have now been revised as requested in https://github.com/Syksy/curatedPCaData/commit/1940b500d809ac06046f120d280bb9f2de49731a and https://github.com/Syksy/curatedPCaData/commit/dff5dee69bc1506bb6485913bea998d43db1ad86 .
OK; These mentioned triple backticks have been corrected to single backticks in https://github.com/Syksy/curatedPCaData/commit/0b46706998a9b783c023dcff28e79fa10f3045ea .
R
OK; The name of the argument has now been changed to
assays
in order to be more in line with the naming conventions inMultiAssayExperiment
via commit https://github.com/Syksy/curatedPCaData/commit/633d17eb510a5d9c1375b88901af409d20fe718f .OK; This function has now been moved to be as part of the
curatedPCaWorkflow
instead, as it relates to oncoprints ( https://github.com/Syksy/curatedPCaData/commit/62be8a42449421dac6888d1d3c787242be8ea192 ).tests
After adjustments such as moving the wrappers focused on e.g. oncoprints out of the package ( https://github.com/Syksy/curatedPCaData/commit/62be8a42449421dac6888d1d3c787242be8ea192 ), adding new functions for replacing the ones previously defined inside vignette (e.g. https://github.com/Syksy/curatedPCaData/commit/6d1aaaae356f11c1cdc73715aba9540e353ee2cc , https://github.com/Syksy/curatedPCaData/commit/e4129376e1b6280bf14483baf908e51852bdfc43 , https://github.com/Syksy/curatedPCaData/commit/b6bffd1ef4b3e669583576e9acc256d5e9934d84 , https://github.com/Syksy/curatedPCaData/commit/71c73c6cc4b44401c730d9e7f268f8a9942fcf0c), and extending the testing coverage ( https://github.com/Syksy/curatedPCaData/commit/32352235fd9f0df8c04fd5d677ae2fb53c2b984c ), the output for this coverage testing is as follows:
Thus, key functionality is tested, with the parts not covered related to marginal cases like download timeout.