ibkstore commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/ibkstore/kmcut
Confirm the following by editing each check box to '[x]'
[ x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[ x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[ x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[ x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[ x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x ] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[ x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

[ x] The 'devel' branch for new packages and features.
[x ] The stable 'release' branch, made available every six months, for bug fixes.
[ x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @ibkstore

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: kmcut
Type: Package
Title: Optimized Kaplan Meier analysis and identification and validation of prognostic biomarkers
Version: 0.99.0
Authors@R: c(person(given = "Igor", family =  "Kuznetsov", role = c("aut", "cre"), email = "ikuznetsov@albany.edu"), person(given = "Javed", family = "Khan", role = "aut", email = "khanjav@mail.nih.gov")) 
Description: The purpose of the package is to identify prognostic biomarkers and an optimal numeric cutoff for each biomarker that can be used to stratify a group of test subjects (samples) into two sub-groups with significantly different survival (better vs. worse). The package was developed for the analysis of gene expression data, such as RNA-seq. However, it can be used with any quantitative variable that has a sufficiently large proportion of unique values.
License: Artistic-2.0
Encoding: UTF-8
LazyData: false
Imports: 
    data.table,
    stats,
    stringr,
    survival,
    tools,
    pracma, 
    grDevices, 
    graphics, 
    utils,
    doParallel,
    foreach, 
    parallel
RoxygenNote: 7.2.3
Suggests: 
    knitr,
    rmarkdown,
    testthat (>= 3.0.0)
VignetteBuilder: knitr
biocViews: Software, StatisticalMethod, GeneExpression, Survival
Config/testthat/edition: 3

vjcitn commented 1 year ago

Very nice work, and a good vignette. I see data requirements:

A file with right-censored survival data and

A file with gene expression-like features. The sample identifiers in both files must be exactly the same. The package contains built-in example files with survival data and RNA-seq gene expression data that describe 295 neuroblastoma tumor samples (Zhang et al, 2015).

The file with survival data must contain at least three columns labeled ‘sample_id’, ‘stime’, and ‘scens’. Column ‘sample_id’ contains a unique identifier of each sample (test subject) and must be the first column in the file. Column ‘stime’ contains the survival time for each sample. Column ‘scens’ contains the censoring variable for each sample (0 or 1). If other columns are present in the file, they will be ignored. ‘stime’ and ‘scens’ can be in any column in the file, except the first. An example file with survival data is provided with the package, its content can be printed as follows (the output of the code is not provided because it is too long):

Please use SummarizedExperiment to manage your data, it will help coordinate the samples as needed. You can have survival tkime objects in the colData. Unless there are strong arguments against using SummarizedExperiment or a derivate of that class, we really like to see them used, to foster interoperability and take advantage of familiarity to a large user base. See contributions.bioconductor.org for guidelines.

ibkstore commented 1 year ago

Very nice work, and a good vignette. I see data requirements:
A file with right-censored survival data and

A file with gene expression-like features. The sample identifiers in both files must be exactly the same. The package contains built-in example files with survival data and RNA-seq gene expression data that describe 295 neuroblastoma tumor samples (Zhang et al, 2015).

The file with survival data must contain at least three columns labeled ‘sample_id’, ‘stime’, and ‘scens’. Column ‘sample_id’ contains a unique identifier of each sample (test subject) and must be the first column in the file. Column ‘stime’ contains the survival time for each sample. Column ‘scens’ contains the censoring variable for each sample (0 or 1). If other columns are present in the file, they will be ignored. ‘stime’ and ‘scens’ can be in any column in the file, except the first. An example file with survival data is provided with the package, its content can be printed as follows (the output of the code is not provided because it is too long):
Please use SummarizedExperiment to manage your data, it will help coordinate the samples as needed. You can have survival tkime objects in the colData. Unless there are strong arguments against using SummarizedExperiment or a derivate of that class, we really like to see them used, to foster interoperability and take advantage of familiarity to a large user base. See contributions.bioconductor.org for guidelines.

Hi Vince,

Thank you for your comments. I apologize for late reply – I was traveling extensively from the end of May until the very end of June and did not check my GitHub thread (I did not receive an email notification about your post). Regarding your comment about the SummarizedExperiment object. The data requirement you are referring to is about two simple input tab-delimited text files. These two files are read into a standard matrix object. We believe that in the context of our package the use of complex SummarizedExperiment objects is not justified because of the simple nature of input files in our package. Below is a more detailed explanation.

Our package was written for researchers engaged in cancer genomics studies with a substantial clinical trials component, such as clinical oncologists, as a major user group. This group of scientists frequently do not have strong programming skills and rely on GUI-based software, such as GraphPad, for survival analysis. They also use Excel for basic manipulation of spreadsheets, such as a simple gene expression table that does not contain any metadata, or a table with key clinical data, such as survival data. Therefore, we designed our package to be as simple as possible to allow “copy-and-paste” approach – the users would copy and paste the example code and just change the input file names to match theirs. The input text file with expression data is very basic and does not contain any metadata, only gene and sample IDs. The input text file with survival data is also very basic. Both files can be easily created and altered in Excel, thus allowing even a beginner to use our package. The formats of these two files also mimic formats used by clinical oncologists. We believe that the SummarizedExperiment object is unnecessarily complex and confusing for such users. Besides, our package does not use any metadata. The expression and survival data are read into simple matrix objects from base R, which are then utilized by the functions.

Best regards, Igor

vjcitn commented 1 year ago

SummarizedExperiment object is unnecessarily complex and confusing for such users. Besides, our package does not use any metadata. The expression and survival data are read into simple matrix objects from base R. Sorry for delay in responding. Part of the rationale for self-describing classes in Bioconductor is to reduce risk of error arising from spreadsheet manipulations, incomplete labeling, loss of provenance information. Bioconductor practices can help reduce risk of problems like these, though they can't eliminate the risk completely. Current package contributors follow the contribution guidelines to implement these and related practices. Your user base may not find SummarizedExperiments valuable, but other developers/analysts who might want to reuse your open software to enhance their analytic environments will be at a disadvantage if you don't even try to use conventions that have helped the project and its users for two decades. I will pass this into review but you may expect some encouragement to use Bioconductor infrastructure and practices, even if you don't want to expose these to your user base.

lshep commented 1 year ago

I will also add that in general it is normally not overly complicated to incorporate use of SummarizedExperiment in functions, for example, you could continue to support matrix like objects AND SummarizedExperiment objects by checking for the class of the data presented -- and then within functions subset or manipulate objects for the remainder of your code . AND as @vjcitn pointed out there are a lot of benefits to using formalized classes to prevent mistakes and actually make management of data results easier

bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Build System.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: kmcut_0.99.0.tar.gz Linux (Ubuntu 22.04.2 LTS): kmcut_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/kmcut to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 1 year ago

Alright. Let me take a look at kmcut. Sorry for the slow response.

hpages commented 1 year ago

Hi @ibkstore,

Sorry again for the delay. Here is some feedback:

The vignette contains the following line:
```
Introduction to the KMCUT package
```
but the name of the package is kmcut (all lower-case). R in general, and package names in particular, are case-sensitive.
Man pages have very long titles, each of them made of several sentences that repeat exactly the content of the Description section. Good man page titles should be short (a few words only).
You seem to misunderstand the purpose of importing other packages in your package. The examples in most man pages start with many library() calls e.g. in ?kmoptpermcut:
```
library(survival)
library(stringr)
library(data.table)
library(tools)
library(pracma)
library(foreach)
library(doParallel)
library(parallel)
library(kmcut)
```
Please note that packages that are imported are automatically available to the code inside your package so there's no need to put the extra burden to manually load each of them on the user. For loading the kmcut package itself, a widely accepted convention is to not pollute the examples in the man pages of a given package (say foo) with explicit calls to library(foo). The reason for this convention is that it is considered reasonable to assume that the package is already loaded when the user is looking at its man pages.
Please minimize the dependencies of your package and only import what you actually need to import. For example, the package does not seem to use anything from the stringr package so that dependency is not needed. Please re-evaluate all the other dependencies and make sure to drop those that you don't need.
Also please re-evaluate the need for all the library() calls at the beginning of the vignette (although keeping library(kmcut) is fine there and is actually needed).
Bioconductor recommends against the use of flatcase style for your function names (e.g. kmoptpermcut, transposetable, kmqcut, kmucut, etc...), as it makes the names hard to read and/or remember. Please use snake_case or camelCase instead.
About the file with gene expression-like features. The vignette says: "The first column must contain sample identifiers and and the first row must contain gene identifiers." Isn't it the otherway around?
The kmoptpermcut() function is extremely slow, to a point that seems to make it almost unsuitable for real-world data. For example, calling it on the example_genes_295.txt file provided in the package takes 89 sec on my laptop when n_iter is set to 1000. However, the file only contains 2 rows (i.e. 2 genes) and the man page for kmoptpermcut() recommends to set n_iter to 10,000. This means that using the recommended n_iter on a file with 100 genes will take more than 10 hours!
How useful is the kmoptpermcut() function when called with wpdf=FALSE? In this case, the function will display a series of plot, one after the other, the next one replacing the previous one at a possibly fast pace, with barely enough time for the user to look at them!
If the goal is to make it as simple as possible for clinical oncologists or other scientists without strong programming skills to use the package, then some improvements on how errors are handled would be welcome. For example, I doubt that the targeted audience will be able to make sense of an error message like this one:
```
> kmoptscut(fname="survival_data_295.txt", sfname="example_genes_295b.txt")
Error in order(sdat$sample_id) : argument 1 is not a vector
```
Documenting a function argument like this:
```
fname: character vector that specifies the name of the file etc...
```
without specifying that the character vector must be of length 1 suggests that the character vector could be of arbitrary length. Note that many functions in R that work on character data are vectorized and can handle a character vector of arbitrary length (e.g. tolower() or grepl()) but that doesn't seem to be the case here. Please disambiguate.
Finally note that coordinating right-censored survival data and gene expression-like features, and keeping them aligned when subsetting the data, is exactly what the SummarizedExperiment container is for. In your case, the SummarizedExperiment object would have the feature ids as rownames and the sample ids as colnames, and the right-censored survival time data would be stored in the colData component of the object. All you would need to do is implement a function that loads the input data in a SummarizedExperiment object e.g. load_kmcut_input(). The function would essentially do something like this:
```
load_kmcut_input <- function(epath, spath)
{
    edat <- read.delim(epath, header = TRUE, stringsAsFactors = FALSE)
    sdat <- read.delim(spath, header = TRUE, row.names = 1)
    ...
    ... some sanity checks about 'edat' and 'sdat'
    ...
    assay <- as.matrix(edat[ , -1])
    rownames(assay) <- edat[ , 1]
    coldata <- DataFrame(sdat)
    SummarizedExperiment(list(expr=assay), colData=coldata)
}
```
Typical use:
```
epath <- system.file("extdata", "example_genes_295.txt", package = "kmcut")
spath <- system.file("extdata", "survival_data_295.txt", package = "kmcut")
kmcut_se <- load_kmcut_input(epath, spath)
kmcut_se
# class: SummarizedExperiment 
# dim: 2 295 
# metadata(0):
# assays(1): expr
# rownames(2): MYCN MYH2
# rowData names(0):
# colnames(295): SEQC_NB001 SEQC_NB002 ... SEQC_NB491 SEQC_NB492
# colData names(2): stime scens
```
Then all functions in the package would take such object as input instead of fname and sfname, making them slightly easier to use and avoiding the need to load the same data from disk again and again.

Furthermore, the user can now subset the data at will by doing things like kmcut_se["MYH2", c("SEQC_NB002", "SEQC_NB003")] with the guarantee that the right-censored survival data and the gene expression-like data remain aligned at all time, hence avoiding the need of utility functions like extractrows() or extractcolumns(). Every Bioconductor user knows how to perform such basic subsetting operation. This is the true Bioconductor way and it has proven to be a very valuable approach to software design.

Thanks, H.

hpages commented 1 year ago

Hi @ibkstore, do you intend to follow up on this submission?

Best, H.

ibkstore commented 1 year ago

Hi Herve, I do intend to follow-up on this submission. However, I am not sure how to proceed – this is my first R package. I am used to feedback from journal article reviewers, when comments are usually divided into major (the ones that must be addressed) and minor (optional). If the submission is rejected, then it is stated explicitly. There is also a specific time limit for revisions. Could you please clarify how the Bioconductor review system works? For instance, what is the time limit? Revision will take a considerable amount of time given the fact that I have heavy teaching and service load now. Can your comments be classified into “mandatory” and “recommended, but optional”? Regarding the kmoptpermcut function: yes, it is very slow and, unfortunately, cannot be made much faster. It estimates the unknown sampling distribution of the test statistics by applying the permutation test, which is very slow. It is recommended that kmoptpermcut be run on at least 16 CPUs (this can be achieved on a low/midrange desktop or high-end laptop). This function is meant to be applied to a relatively small subset of candidate genes identified by the kmoptscut function. Also, a typical number of samples in a dataset supplied to this function is expected to be much smaller than in the example dataset included into the package.

Thank you, Igor

hpages commented 1 year ago

Hi @ibkstore,

Thanks for getting back to me.

I do intend to follow-up on this submission.

Great!

I am used to feedback from journal article reviewers

The review process for Bioconductor software packages is slightly different than with articles submitted to peer-review journals.

If the submission is rejected, then it is stated explicitly.

This submission is still labelled with "review in progress", which means that the package was neither rejected or accepted yet.

Could you please clarify how the Bioconductor review system works?

See https://contributions.bioconductor.org/bioconductor-package-submissions.html#whattoexpect

what is the time limit?

There's no time limit. However we do ask contributors to respond to comment within 2-3 weeks (see above link). Even if they didn't get a chance to address the feedback, acknowledgement that they've received it and/or questions about it are welcome. Submission with no visible activity for a prolonged period of time (e.g. > 1 month) will be considered abandonned and might get closed for inactivity. Note that an issue closed for inactivity doesn't mean that the submission got rejected as the issue can still be re-opened at a later time by the submitter or reviewer.

FWIW this is actually the reason why last week I asked about your intentions. I was considering closing this issue for inactivity but thought I would first ask if you intended to follow up on it.

Can your comments be classified into “mandatory” and “recommended, but optional”?

All the items are mandatory and easy to fix, except items 8 and 12.

About item 8: Trying to speed up kmoptpermcut() would be nice though, if the function is meant to be used on real-world data in reasonable time. However, and to my surprise, it sounds like my naive assumption that real-world data will typically be bigger than the included dataset was wrong (see below). If that's the case, then maybe the speed of kmoptpermcut() is acceptable.

About item 12: @vjcitn and @lshep already pointed the benefits of using a SummarizedExperiment object to present your data to the end user, and to operate on such representation. Item 12 is a follow-up to this discussion. It provides some details about how to do that and what some of the concrete benefits would be. I also wanted to make sure that you're not dismissing this approach too quickly or for the wrong reasons (e.g. "this is too difficult"). Anyways, I will leave this to your consideration. I suppose that adding support for SummarizedExperiment could always be added to kmcut in the future, if there's demand for it.

Also, a typical number of samples in a dataset supplied to this function is expected to be much smaller than in the example dataset included into the package.

Why use an example dataset much bigger than the typical real-world dataset? This makes the code in your examples and vignette unnecessarily slower than it needs to be. Package authors usually do the opposite i.e. they include examples datasets that are smaller than the typical real-world dataset. This approach seems much more sensible!

Best, H.

lshep commented 10 months ago

@ibkstore Do you plan on revising the package to the suggestions from @hpages , @vjcitn and myself?

ibkstore commented 10 months ago

@ibkstore Do you plan on revising the package to the suggestions from @hpages , @vjcitn and myself?

Yes, I do plan to revise. I will finish the revisions after this semester ends and I have free time to work on the package.

lshep commented 9 months ago

We will close this issue for now. please comment back here when you are ready to continue and we will reopen the issue.

bioc-issue-bot commented 9 months ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for your interest in Bioconductor.

ibkstore commented 5 months ago

Hello,

I would like to reopen this issue. I created an updated version of the package that addresses all the reviewers' comments. Please advise how to proceed with the upload of the new version and the responses to comments. I am not sure what to do with the new and old versions and with the responses. Do I need to delete the old version and keep the new one and add a file with the responses to comments?

Thank you, Igor

lshep commented 5 months ago

Do the histories of the repositories not match? Did you not work in the same repository as before?

ibkstore commented 5 months ago

I am using the same repository. Does it mean that I can just copy the new version into that repository and reviewers will be able to use the history to access the previous version? I am new to GitHub and do not know how repository history functions and how to check it.

bioc-issue-bot commented 5 months ago

Dear @ibkstore ,

We have reopened the issue to continue the review process. Please remember to push a version bump to git.bioconductor.org to trigger a new build.

bioc-issue-bot commented 5 months ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

lshep commented 5 months ago

You should familiarize yourself with git and git commands. Also this previous comment links to a git workflow for pushing to git.bioconductor.org and indicates that you should set up your BiocCredentials account to control ssh key access.

ibkstore commented 5 months ago

My Bioconductor Git Credentials are set up. I used the sequence of Git commands from the Bioconductor guidelines to push the updated package to git.bioconductor.org I used the following sequence:

git clone git@git.bioconductor.org:packages/kmcut cd kmcut

Copy updated/new package files/folders to the local folder 'kmcut', then

git add git commit git push

My responses to reviewer's comments are included in the commit message file. Does this look like a correct sequence? Do I need to perform any additional tasks at this point?

hpages commented 5 months ago

@ibkstore Welcome back!

Does this look like a correct sequence?

No, this is not how you are supposed to resync the kmcut repo at git.bioconductor.org with the repo on GitHub. You seem to have completely missed the link that @lshep gave to you. The link was sending you to this important comment at the beginning of this issue:

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Unfortunately the problem with the method you used is that the 2 repos now have divergent histories. As @lshep mentioned, you really need to familiarize yourself with git and git commands. This is an essential skill for being a Bioconductor package maintainer.

Also the git commit message is not really the place to put your responses to my earlier comments. Please put them in this issue. In the future, use the git commit message only to document the changes you are making to your package.

Finally, please bump the version of the package and push in order to trigger a new build.

Thanks, H.

ibkstore commented 5 months ago

Hi Herve, Sorry about the mess. I was confused by the chapter header "21.5 Subversion to Git Transition". Chapter "21 Git Version Control" seemed more appropriate and I used the info from there. Is there anything I can do to "undo" my mistakes before I proceed with Chapter 21.5 steps? Igor

bioc-issue-bot commented 5 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d9cd8cf2539e4ab64b4097ac786dfbeb739d7c4c

bioc-issue-bot commented 5 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): kmcut_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/kmcut to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 5 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 0123ac513f8adf257a5491cabd66f720564713ff

bioc-issue-bot commented 5 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): kmcut_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/kmcut to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 5 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: afdda1a804e2c4ac211797536645c63bb18d1f71

bioc-issue-bot commented 5 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): kmcut_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/kmcut to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ibkstore commented 5 months ago

Hello, The updated version of the package, v.0.99.3, was uploaded and successfully compiled. Responses to all reviewer's comments are below.

COMMENT: The vignette contains the following line: Introduction to the KMCUT package

but the name of the package is kmcut (all lower-case). R in general, and package names in particular, are case-sensitive.

RESPONSE: The package name was changed to lower-case letters.

COMMENT:

Man pages have very long titles, each of them made of several sentences that repeat exactly the content of the Description section. Good man page titles should be short (a few words only).

RESPONSE: Titles of man pages were shortened.

COMMENT:

You seem to misunderstand the purpose of importing other packages in your package. The examples in most man pages start with many library() calls e.g. in kmoptpermcut: library(survival) library(stringr) library(data.table) library(tools) library(pracma) library(foreach) library(doParallel) library(parallel) library(kmcut)
Please note that packages that are imported are automatically available to the code inside your package so there's no need to put the extra burden to manually load each of them on the user. For loading the kmcut package itself, a widely accepted convention is to not pollute the examples in the man pages of a given package (say foo) with explicit calls to library(foo). The reason for this convention is that it is considered reasonable to assume that the package is already loaded when the user is looking at its man pages.

RESPONSE: All library() calls in examples were removed.

COMMENT:

Please minimize the dependencies of your package and only import what you actually need to import. For example, the package does not seem to use anything from the stringr package so that dependency is not needed. Please re-evaluate all the other dependencies and make sure to drop those that you don't need.

RESPONSE: The dependencies on and other unnecessary packages were removed.

COMMENT:

Also please re-evaluate the need for all the library() calls at the beginning of the vignette (although keeping library(kmcut) is fine there and is actually needed).

RESPONSE: All library() calls at the beginning of the vignette were removed, except for library(kmcut)

COMMENT:

Bioconductor recommend against the use of flatcase style for your function names (e.g. kmoptpermcut, transposetable, kmqcut, kmucut, etc...), as it makes the names hard to read and/or remember. Please use snake_case or camelCase instead.

RESPONSE: All function names were changed to include underscores (as in snake_case )

COMMENT:

About the file with gene expression-like features. The vignette says: "The first column must contain sample identifiers and and the first row must contain gene identifiers." Isn't it the otherway around?

RESPONSE: You are correct. This error was fixed.

COMMENT:

The kmoptpermcut() function is extremely slow, to a point that seems to make it almost unsuitable for real-world data. For example, calling it on the example_genes_295.txt file provided in the package takes 89 sec on my laptop when n_iter is set to 1000. However, the file only contains 2 rows (i.e. 2 genes) and the man page for kmoptpermcut() recommends to set n_iter to 10,000. This means that using the recommended n_iter on a file with 100 genes will take more than 10 hours!

RESPONSE: Unfortunately, km_opt_pcut() cannot be made much faster. It estimates the unknown sampling distribution of the test statistics by applying the permutation test, which is very slow. It is recommended that kmoptpermcut be run on at least 16 CPUs (this can be achieved on a low/midrange desktop or high-end laptop). This function is meant to be applied to a relatively small subset of candidate genes identified by the km_opt_scut function. The recommendation regarding application of km_opt_pcut() to a relatively small subset of genes identified by km_opt_scut() was added to the vignette. The size of the package's built-in example dataset was reduced to better reflect the number of samples in a real life dataset.

COMMENT:

How useful is the kmoptpermcut() function when called with wpdf=FALSE? In this case, the function will display a series of plot, one after the other, the next one replacing the previous one at a possibly fast pace, with barely enough time for the user to look at them!

RESPONSE: When km_opt_pcut() is used with wpdf=FALSE from RStudio or R GUI, after the execution is over, the user can go back and forth through the plots by clicking left and right arrows. The main reason why this option was added is to provide a functionality to embed the graphs during the vignette compilation.

COMMENT:

If the goal is to make it as simple as possible for clinical oncologists or other scientists without strong programming skills to use the package, then some improvements on how errors are handled would be welcome. For example, I doubt that the targeted audience will be able to make sense of an error message like this one:

kmoptscut(fname="survival_data_295.txt", sfname="example_genes_295b.txt") Error in order(sdat$sample_id) : argument 1 is not a vector

RESPONSE: The check of file existence and other argument checks were added to all functions.

COMMENT: Documenting a function argument like this: fname: character vector that specifies the name of the file etc...

without specifying that the character vector must be of length 1 suggests that the character vector could be of arbitrary length. Note that many functions in R that work on character data are vectorized and can handle a character vector of arbitrary length (e.g. tolower() or grepl()) but that doesn't seem to be the case here. Please disambiguate.

RESPONSE: This was clarified in the updated version, all man pages now state that file name must be a character vector of length one.

COMMENT:

Finally note that coordinating right-censored survival data and gene expression-like features, and keeping them aligned when subsetting the data, is exactly what the SummarizedExperiment container is for. In your case, the SummarizedExperiment object would have the feature ids as rownames and the sample ids as colnames, and the right-censored survival time data would be stored in the colData component of the object. All you would need to do is implement a function that loads the input data in a SummarizedExperiment object e.g. load_kmcut_input(). The function would essentially do something like this: load_kmcut_input <- function(epath, spath) { edat <- read.delim(epath, header = TRUE, stringsAsFactors = FALSE) sdat <- read.delim(spath, header = TRUE, row.names = 1) ... ... some sanity checks about 'edat' and 'sdat' ... assay <- as.matrix(edat[ , -1]) rownames(assay) <- edat[ , 1] coldata <- DataFrame(sdat) SummarizedExperiment(list(expr=assay), colData=coldata) } Typical use: epath <- system.file("extdata", "example_genes_295.txt", package = "kmcut") spath <- system.file("extdata", "survival_data_295.txt", package = "kmcut") kmcut_se <- load_kmcut_input(epath, spath) kmcut_se
class: SummarizedExperiment

dim: 2 295

metadata(0):

assays(1): expr

rownames(2): MYCN MYH2

rowData names(0):

colnames(295): SEQC_NB001 SEQC_NB002 ... SEQC_NB491 SEQC_NB492

colData names(2): stime scens
1. Then all functions in the package would take such object as input instead of fname and sfname, making them slightly easier to use and avoiding the need to load the same data from disk again and again. Furthermore, the user can now subset the data at will by doing things like kmcut_se["MYH2", c("SEQC_NB002", "SEQC_NB003")] with the guarantee that the right-censored survival data and the gene expression-like data remain aligned at all time, hence avoiding the need of utility functions like extractrows() or extractcolumns(). Every Bioconductor user knows how to perform such basic subsetting operation. This is the true Bioconductor way and it has proven to be a very valuable approach to software design.

RESPONSE: All core function were changed to take a SummarizedExperiment object as input. A function that creates SummarizedExperiment objects from two input files, create_se_object(), was also added.

hpages commented 3 months ago

Hi @ibkstore,

Thank your for all the improvements to the package and sorry for the slow response. All major issues have been addressed, congratulations! Only 3 minor things left and the package will be ready to go:

This R CMD check NOTE should not be ignored:

Undefined global functions or variables:
  as.formula dev.off grid is legend p.adjust pchisq pdf points predict
  quantile read.delim write.table
Consider adding
  importFrom("grDevices", "dev.off", "pdf")
  importFrom("graphics", "grid", "legend", "points")
  importFrom("methods", "is")
  importFrom("stats", "as.formula", "p.adjust", "pchisq", "predict",
             "quantile")
  importFrom("utils", "read.delim", "write.table")
to your NAMESPACE file (and ensure that your DESCRIPTION Imports field
contains 'methods').

Please address.

You use BiocStyle in your vignette so you must have it in your Suggests field.
Also you have testthat in your Suggests field but you don't have unit tests. So either add unit tests to the package or remove testthat from your Suggests field. We strongly recommend the former.

Thanks again, H.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e254d26bbb129240eba7cf1996ddb4288ddba921

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: kmcut_0.99.4.tar.gz Linux (Ubuntu 22.04.3 LTS): kmcut_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/kmcut to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8963bce299d0f3409954376b33731a199d775445

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: kmcut_0.99.5.tar.gz Linux (Ubuntu 22.04.3 LTS): kmcut_0.99.5.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/kmcut to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ibkstore commented 3 months ago

Hello Herve, I uploaded an updated version of the package that addresses your comments below. Best, Igor

COMMENT: This R CMD check NOTE should not be ignored: Undefined global functions or variables: as.formula dev.off grid is legend p.adjust pchisq pdf points predict quantile read.delim write.table Consider adding importFrom("grDevices", "dev.off", "pdf") importFrom("graphics", "grid", "legend", "points") importFrom("methods", "is") importFrom("stats", "as.formula", "p.adjust", "pchisq", "predict", "quantile") importFrom("utils", "read.delim", "write.table") to your NAMESPACE file (and ensure that your DESCRIPTION Imports field contains 'methods').

RESPONSE: NAMESPACE and DESCRIPTION files have been updated. No more R CMD check NOTES appear.

COMMENT: You use BiocStyle in your vignette so you must have it in your Suggests field.

RESPONSE: BiocStyle has been added to Suggests field.

COMMENT: Also you have testthat in your Suggests field but you don't have unit tests. So either add unit tests to the package or remove testthat from your Suggests field. We strongly recommend the former.

RESPONSE: testthat has been removed from Suggests field.

bioc-issue-bot commented 2 months ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 2 months ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/ibkstore.keys is not empty), then no further steps are required. Otherwise, do the following:

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("kmcut"). The package 'landing page' will be created at

https://bioconductor.org/packages/kmcut

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Bioconductor / Contributions

KMCUT #3018

Copy updated/new package files/folders to the local folder 'kmcut', then

class: SummarizedExperiment

dim: 2 295

metadata(0):

assays(1): expr

rownames(2): MYCN MYH2

rowData names(0):

colnames(295): SEQC_NB001 SEQC_NB002 ... SEQC_NB491 SEQC_NB492

colData names(2): stime scens