Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

CaDrA #2850

Closed RC-88 closed 1 year ago

RC-88 commented 1 year ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 year ago

Hi @RC-88

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: CaDrA
Type: Package
Title: Candidate Driver Analysis
Version: 0.99.0
Date: 2022-04-25
Authors@R: 
    c(person(given="Reina", family="Chau", role=c("aut","cre"), email="rchau88@bu.edu"), 
      person(given="Katia", family="Bulekova", role=c("aut"), email="ktrn@bu.edu", comment=c(ORCID="0000-0003-1560-2146")),
      person(given="Vinay", family="Kartha", role=c("aut"), email="vkartha@bu.edu"), 
      person(given="Stefano", family="Monti", role=c("aut"), email="smonti@bu.edu", comment=c(ORCID="0000-0002-9376-0660"))
     )
Description: Performs both stepwise and backward heuristic search for candidate (epi)genetic drivers based on a binary multi-omics dataset. CaDrA's main objective is to identify features which, together, are significantly skewed or enriched pertaining to a given vector of continuous scores (e.g. sample-specific scores representing a phenotypic readout of interest, such as protein expression, pathway activity, etc.), based on the union occurence (i.e. logical OR) of the events.
Depends: R (>= 4.0.0)
LazyData: false
License: GPL-3 + file LICENSE
URL: https://github.com/montilab/CaDrA/
Packaged: 2017-07-10 18:44:45 UTC; 
Encoding: UTF-8
RoxygenNote: 7.2.1
Imports: 
    Biobase,
    doParallel,
    plyr,
    ggplot2,
    grid,
    gplots,
    gtable,
    R.cache,
    reshape2,
    magrittr,
    purrr,
    stats,
    methods,
    MASS,
    misc3d,
    ppcor,
    graphics
Suggests: 
    BiocStyle,
    devtools,
    rmarkdown,
    knitr,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
biocViews: Microarray, RNASeq, GeneExpression, Software, FeatureExtraction
VignetteBuilder: knitr
BugReports: https://github.com/montilab/CaDrA/issues
bioc-issue-bot commented 1 year ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ce83766062d223ba033e43027302141ba1d3053d

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 1 year ago

Hi Reina, Thank you for your submission. Please see the review below. Best regards, Marcel


CaDrA #2850

Unfortunately, I do not see a strong use case of Bioconductor in the package. The ExpressionSet class has been superseded by SummarizedExperiment for many years now. Many of the exported functions in the package return a list or data.frame which do not integrate well with the rest of Bioconductor. We recommend using SummarizedExperiment as the main representation for input and output objects.

DESCRIPTION

NAMESPACE

vignettes

R

tests

LiNk-NY commented 1 year ago

Hi Reina, @RC-88

Any updates in response to the review? If you need more time, we can close the issue and you can re-open when ready.

Best regards, Marcel

RC-88 commented 1 year ago

Hi Marcel, @LiNk-NY

Thanks for your feedback! and Sorry for the lack of responses! My collaborators and I are working diligently to address some of the concerns that you mentioned. We are almost there. Just a few more unit tests in place to make sure the tool is working as expected. In brevity, we made quite a significant changes to our package based on your feedback (such as replacing ExpressionSet with SummarizedExperiment object, renaming our variable "ES" to "FS" in order to align with our vignettes and documentation, reduce complexity of our core functions, etc.) I will keep you posted next week with in-depth details of our changes, and hopefully, these changes will meet up to current BioConductor's usage.

RC-88 commented 1 year ago

Hi Marcel, @LiNk-NY

My collaborators and I are almost done with the revision of the CaDrA package with newly optimized functions ready for resubmission. Just curious how long does it take until a BioConductor submission expired? Thanks!

LiNk-NY commented 1 year ago

Hi @RC-88 Submissions do not really expire. But we try to stick to a max 14 day non-response period.

RC-88 commented 1 year ago

@LiNk-NY, I see. Apologize for the lack of responses! Will try to keep you up-to-date with our progress or changes!

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1957fe24b6ff42857f896531aee4747a5b392807

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

RC-88 commented 1 year ago

Hi @LiNk-NY, how are you? I recently pushed out the new build for our updated package CaDrA. However, it seems that the package passed RCMD check in MAC OS platform but failed in Linux. I tried to recheck the build on our Linux system, and it seems that the package built successfully without any errors. I expect some package dependencies required by CaDrA (such as R.cache and SummarizedExperiment) does not exist in the library of BioConductor Linux build system. We used R.cache package to do some caching for future loading of the permutation-based testing results. This step would help users save time in recomputing the permutation-based testings with similar parameters as this task is intensive and time-consuming. We also updated our BioBase:ExpressionSet() objects to use SummarizedExperiment::assay() objects as recommended. Is there a way for us to resolve this issue? Thanks so much!

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: d357edab2071f5af8da8ef3ff979ef72ca533707

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 641099f1f6b5d2c94c9ab3ed2877a9fb57f8f339

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 1 year ago

Hi Reina, @RC-88

Apologies for the delayed response. We get a lot of packages to review around this time. Please respond to the issue bullet points for the review to continue.

Best regards, Marcel

RC-88 commented 1 year ago

Hi Marcel, @LiNk-NY

No worries! I'm glad we didn't lose you and would need to resubmit our package again. Thanks for being patience with us!

Finally, my collaborators and I had finalized all of the changes needed to push our CaDrA package toward review again. We had spent a significant amount of effort to address the feedback that you shared with us a while back.

Here is a list of things that we didn't changed and why:

  1. We didn't change the name of our package, CaDrA, and would like to keep its original name. The reason is that CaDrA is actually short for Candidate Driver Analysis. It is a statistical framework that our lab developed to identify (epi) genetic drivers of a molecular phenotype of interest. We previously published a paper that is associated with the name of this package. Please see our associated manuscript, Kartha et al. (2019), for more details.
  2. Right now, some of our exported functions are still returning a list of objects such as candidate_search() and CaDrA() functions. The reason we prefer a list of objects as output rather than a SummarizedExperiment object is because our core functions performs heuristic search to identify candidate drivers of a molecular phenotype of interest over a given set of features with top N scores. Therefore, the result will consist of a list of objects including a set of meta-features in form of a SummarizedExperiment object, its observed input_score as a vector of continuous scores, and a corresponding best score pertaining to each top N feature searches. Thus, it is quite difficult to contain the results into one giant SummarizedExperiment object. In the future, we are considering to convert our analysis results into a S6 class object.

Here is a list of things that we had changed:

  1. Removed the %>% pipe and used the native |> pipe
  2. Removed BioBase and used SummarizedExperiment package instead
  3. Moved vignettes that reproduce manuscript figures into their own repository rather than the CaDrA package. Completed! We created a separate Github repo, CaDrA-shiny, which we are currently used to showcase the examples of how run CaDrA through an R Shiny interface (also see our web portal, https://cadra.bu.edu/)
  4. The vignettes should exhibit the functionality of the package with a small example dataset. Agreed, right now, we are using a small simulation dataset to document all of the vignettes in the package.
  5. Removed devtools::load_all function calls
  6. All functions in the package currently take a SummarizedExperiment/data matrix as inputs and some time both especially for the custom_rowscore() function
  7. Avoided ambiguous abbreviations, e.g., ES in arguments stands for ExpressionSet but some functions have ES for enrichment score. Agreed. We had thoroughly rewritten the arguments to make it clearer for our users to look up our documentation (see the reference on our github pages).
  8. Added additional unit tests
  9. Optimized the complexity of our core functions using cyclocomp

before:

name cyclocomp
candidate_search 72
revealer_genescore_mat 57
CaDrA 47
revealer_genescore 43
custom_genescore_mat 41
ks_gene_score_mat 35

after:

name cyclocomp
custom_rowscore 33
check_data_input 31
CaDrA 26
candidate_search 23
wilcox_score 21
check_top_N 16
cond_assoc 10
generate_permutations 6
stacked_gtable_max 5
forward_backward_check 4
ks_rowscore_calc 4
ks_test_double_wrap 4
revealer_rowscore 4
calc_rowscore 3
ks_genescore_wrap 3
ks_rowscore 3
topn_plot 3
verbose 3
ks_plot_coordinates 2
permutation_plot 2
prefilter_data 2
setup_tab_heights 2
cond_mutual_inf 1
ks_genescoremat 1
ks_genescorewrap 1
ks_plot 1
ks_plotwrap 1
ks_test_dwrap 1
meta_plot 1

Let me know if there are other areas that you think we should improve. Look forward to hear from you!

LiNk-NY commented 1 year ago

Hi Reina, @RC-88

Thank you for making those changes.

  1. We didn't change the name of our package, CaDrA, and would like to keep its original name. The reason is that CaDrA is actually short for Candidate Driver Analysis. It is a statistical framework that our lab developed to identify (epi) genetic drivers of a molecular phenotype of interest. We previously published a paper that is associated with the name of this package. Please see our associated manuscript, Kartha et al. (2019), for more details.

That's okay but users may find the mixed case inconvenient. Note that the cadra.R file does not have mixed case. So if it is inconvenient for the developer, it may likely be inconvenient for users.

  1. Right now, some of our exported functions are still returning a list of objects such as candidate_search() and CaDrA() functions. The reason we prefer a list of objects as output rather than a SummarizedExperiment object is because our core functions performs heuristic search to identify candidate drivers of a molecular phenotype of interest over a given set of features with top N scores. Therefore, the result will consist of a list of objects including a set of meta-features in form of a SummarizedExperiment object, its observed input_score as a vector of continuous scores, and a corresponding best score pertaining to each top N feature searches. Thus, it is quite difficult to contain the results into one giant SummarizedExperiment object. In the future, we are considering to convert our analysis results into a S6 class object.

This deviates from Bioconductor standards of a centralized representation. It is not clear to me how an S6 class would be an improvement.

Here is a list of things that we had changed:

...

  1. All functions in the package currently take a SummarizedExperiment/data matrix as inputs and some time both especially for the custom_rowscore() function

This doesn't seem to be the case. Am I missing something here? https://github.com/montilab/CaDrA/blob/9226dbd3f093461b3b1ae519c6cba53bc42f9226/R/custom_rowscore.R#L7-L9

Best, Marcel

RC-88 commented 1 year ago

Hi Marcel, @LiNk-NY

Thanks for your feedback and to address your previous comments:

  1. Thank you for your concern, and we understand the possible inconvenience of having mixed cases. However, we accept the trade-off, as we deem maintaining the connection with the published methodology more important.

  2. The use of a SummarizedExperiment is not really an option, since the value returned by the function is a list of multiple objects of different sizes/dimensions for which the SS is not suitable. CaDrA does not simply perform some processing of a SummarizedExperiment (such as, e.g., scaling or normalization), rather it uses a SS as one of its inputs, and the output has multiple parts. We could define a new R class to represent its multi-part output, but it seemed a bit of an overkill for our purposes.

  3. To clarify, calc_rowscore() is actually the main function that is used to run the custom_rowscore() (behind the scene) by specifying method = "custom". The required parameters to run both calc_rowcore() and custom_rowscore() are a set of binary features (e.g. somatic mutations, CNV, etc.) and a vector of continuous scores representing a phenotypic readout of interest (e.g. protein expression, pathway activity, etc.) For the custom method only, you have full control of your defined function, thus, you can easily pass the feature set as a summarized experiment or a data matrix to your customized function as long as your function has a way to handle different types of feature set objects.

Here are two examples that demonstrate the usage of having a summarized experiment or a data matrix as an input to the custom method:

Installation

library(devtools)
devtools::install_github("montilab/CaDrA", ref="devel")

Quickstart

library(CaDrA)
library(SummarizedExperiment)

A customized function using ks.test() that can handle both SummarizedExperiment or data matrix as an input object

customized_rowscore <- function(FS_mat, input_score, alternative="less"){

  # Check if FS_mat is a matrix or a SummarizedExperiment class object
  if(!is(FS_mat, "SummarizedExperiment") && !is(FS_mat, "matrix"))
    stop("'FS_mat' must be a matrix or a SummarizedExperiment class object
         from SummarizedExperiment package")

  # Retrieve the feature binary matrix
  if(is(FS_mat, "SummarizedExperiment")){
    mat <- as.matrix(SummarizedExperiment::assay(FS_mat))
  }else{
    mat <- FS_mat
  }

  ks <- apply(mat, 1, function(r){
    x = input_score[which(r==1)];
    y = input_score[which(r==0)];
    res <- ks.test(x, y, alternative=alternative)
    return(c(res$statistic, res$p.value))
  })

  # Obtain score statistics and p-values from KS method
  stat <- ks[1,]

  # Change values of 0 to the machine lowest value to avoid taking -log(0)
  pval <- ks[2,]
  pval[which(pval == 0)] <- .Machine$double.xmin

  # Compute the -log(pval)
  # Make sure scores has names that match the row names of FS_mat object
  scores <- -log(pval)
  names(scores) <- rownames(mat)

  return(scores)

}

A customized function using ks.test() with a data matrix as an input

# Load pre-computed feature set
data(sim_FS)

# Load pre-computed input-score
data(sim_Scores)

# Main function is calc_rowscore() with a given method = "custom"
custom_rowscore_result_v1 <- calc_rowscore(
  FS_mat = SummarizedExperiment::assay(sim_FS),   ## <--- Here FS_mat takes assay(sim_FS) which is a data matrix
  input_score = sim_Scores,
  method = "custom",
  custom_function = customized_rowscore,
  custom_parameters = NULL
)
head(custom_rowscore_result_v1)
## TN_716 TN_963 TP_8 TP_9 TN_844 TN_108
## 7.206006 5.471405 5.387492 5.387492 5.387492 5.289084

A customized function using a SummarizedExperiment object as an input

  # Load pre-computed feature set
  data(sim_FS)

  # Load pre-computed input-score
  data(sim_Scores)

  # Main function is calc_rowscore() with a given method = "custom"
  custom_rowscore_result_v2 <- calc_rowscore(
    FS_mat = sim_FS,              ## <--- here FS_mat takes sim_FS which is originally a SummarizedExperiment object
    input_score = sim_Scores,
    method = "custom",
    custom_function = customized_rowscore,
    custom_parameters = NULL
  )
head(custom_rowscore_result_v2)
## TN_716 TN_963 TP_8 TP_9 TN_844 TN_108
## 7.206006 5.471405 5.387492 5.387492 5.387492 5.289084

Hope we address your concerns this time and look forward to your feedback!

Best,

Reina C.

LiNk-NY commented 1 year ago

Hi Reina, @RC-88

Thank you for your response.

It seems that you expect the user to be able to write a "customized" function that can handle both matrix and SummarizedExperiment. I am wondering, why not have the function handle both type of inputs?

This strategy would improve the usability of your calc_rowscore function. In other words, calc_rowscore would accept and work with both types of inputs for the FS_mat argument (which may have to be renamed).

The examples that you show above would then be reduced into a single function call because the user will not have to jump through hoops to use your interface.

Best, Marcel

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 55fad803bd1145a99398fc05de4f2f3677e7a399

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

RC-88 commented 1 year ago

Hi Marcel, @LiNk-NY

Thank you for your feedback! I agreed with your strategy of utilizing both SummarizedExperiment and data matrix as inputs to our core functions (calc_rowscore(), candidate_search(), CaDrA()). This definitely improves the usability of our package. I had implemented an update to the package which now can handle both type of inputs.

I attached a pdf document generated from an Rmd which showcases the usage of taking both SummarizedExperiment and data matrix as inputs to our core functions

testing_cadra_functions.pdf

Hope this updated changes would get your approval :) Thanks again!

RC-88 commented 1 year ago

Hi Marcel, how are you?

I just wanted to check in to see if you are okay with our updated changes.

Best,

Reina C.

LiNk-NY commented 1 year ago

Hi Reina, Sorry for the delay. Thank you for your patience. The changes look good. Consider removing the donttest markers in the examples and using a small example dataset if possible. Best, Marcel

bioc-issue-bot commented 1 year ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 008d229cca9fb8177d22e90a6b2a5f760c1a3825

bioc-issue-bot commented 1 year ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/CaDrA to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

RC-88 commented 1 year ago

Hi Marcel, how are you?

I had removed the donttest markers in the examples. The sim_FS dataset that was used in the samples is quite small, 100 features by 100 samples. Let me know if this works with you.

Thanks again!

Reina C.

LiNk-NY commented 1 year ago

Hi Reina, @RC-88 Thank you for making those changes. The small dataset is great. Your package has been accepted. Best regards, Marcel

bioc-issue-bot commented 1 year ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

RC-88 commented 1 year ago

Hi Marcel, @LiNk-NY

It is a great news to hear that our package has gotten accepted! Thank you so much for your thorough review that guided me through this submission process!

Best,

Reina C.

lshep commented 1 year ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/RC-88.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("CaDrA"). The package 'landing page' will be created at

https://bioconductor.org/packages/CaDrA

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.