gabrielodom / pathwayPCA

integrative pathway analysis with modern PCA methodology and gene selection
https://gabrielodom.github.io/pathwayPCA/
11 stars 2 forks source link

Bioconductor Review Comments #68

Closed gabrielodom closed 5 years ago

gabrielodom commented 5 years ago

We now have the comments from the Bioconductor review team: https://github.com/Bioconductor/Contributions/issues/1000

gabrielodom commented 5 years ago

This thread will be the continuation of the work done in Issue #64.

gabrielodom commented 5 years ago

General

Build report

readme

Add Bioconductor installation instructions as well

Description

User Authors@R (recommended) or Author:/Maintiner: not both.

Namespace

You have exported a lot of functions. Are all these functions intended to be used directly by the user or are some used internally only and be removed as an export?

news

Should have only one news file please remove NEWS (suggested removal) or NEWS.md (suggested keep)

inst/vignette_drafts/vignette_docx

Does not seem like this should be included here and exposed to users. Please remove. Move any relevant vignettes to the vignette directory

inst/misc_scripts

What are these scripts? Again are these relevant that they should be exposed to the user. Else remove

inst/extdata

Only include data files in extdata. Create a new inst/scripts directory to contain scripts for generating the data. These scripts (either R/sudo code/or paragraph) should indicate the source of the data and how to replicate creation.

data

In the man pages for colonSurv_df.Rd please include the source information for data where this subset was taken from.

docs/testing_files

What are the purpose of these directories? They probably should not be git tracked? These directories will be 'ignored' by R CMD build but still cloned over. The desired state is closer to only package relevant files on the master/git repository.

pathwayPCA.pdf

Remove this from the top level directory (git rm)

vignette: Introduction_to_pathwayPCA

But it is unclear to me why? Also couldn't this be automated for the user instead of manually run?

R code

While there is a lot of it - generally seems well organized and well written.

gabrielodom commented 5 years ago

title: "Response to Bioconductor Review" author: "Gabriel Odom, Lizhong Liu, James Ban, Lily Wang, and Steven Chen" date: "March 28, 2019" output: html_document

Introduction

We thank the Bioconductor review team, in particular Lori Shepherd, for their valuable time spent reviewing our package and for their insightful feedback. We have carefully thought about all comments and made many revisions accordingly in the revised package and documentations. We hope this resubmission address the reviewer’s concerns well.


General Comments

Comment:

We encourage the use of BiocParallel over parallel for parallelization implementation. Please explain why this was not used if not updating

Response: We appreciate your comment. During our package design phase, we did consider using the BiocParallel package due to its superb multi-machine/supercluster functionality and flexibility. However, after considerable thought, we chose to implement Snow-style parallel computing. We felt that the base parallel package was quite sufficient for our needs. Given that many of our package’s current users work in Windows environments, and that the pathwayPCA algorithm is not overly complex, we decided to use a simple single-machine Snow cluster. We have extensively tested our parallel computing setup on Windows, Macintosh, and Linux systems.

Comment:

Standard Biconductor objects and classes for data should be utilized whereever possible instead of creating new. Please explain why no existing structure could be used or extended.

Response: Thanks for your comment. In response, to make our package compatible with SummarizedExperiment objects, we have now added a function SE2Tidy() that can import a SummarizedExperiment object.

Comment:

Can this be integrated with other Bioconductor packages/input/output

Response: Yes. Our package now interfaces with the SummarizedExperiment package for assay data and the rWikiPathways package for pathway collections.


Build Report

Comment:

Normally require/library calls are not necessary in code if the proper packages (or select functions from packages) were imported in the NAMESPACE. Please investigate this WARNING

Response: This WARNING is triggered by loading a necessary package onto the Snow-style cluster suite. Per the Introduction to BiocParallel section 4.1.2 on clusters, this library() call would be required even with the BiocParallel package. I don't see a way around this.

Comment:

We strongly encourage units tests. They can be helpful in debugging code and to ensure the code is working as intended.

Response: This is an excellent suggestion. In response, we have added unit tests in the /tests/testthat/ directory, as suggested in the Bioconductor Unit Testing guidelines.


Top-Level Files

README.md

Comment:

Add Bioconductor installation instructions as well

Response: We have updated the README.md file, the GitHub README, and the package home page with Bioconductor installation instructions.

DESCRIPTION

Comment:

User Authors@R (recommended) or Author:/Maintiner: not both.

Response: We have removed the redundant "Maintainer" field.

NAMESPACE

Comment:

You have exported a lot of functions. Are all these functions intended to be used directly by the user or are some used internally only and be removed as an export?

Response: As you suggested, we have removed a few dozen internal functions from the NAMESPACE of our package. However, in order to keep the examples we wrote for these functions (to aid in future development), we have wrapped them in a \dontrun{} environment. This creates a new NOTE. Please disregard this NOTE.

NEWS.md

Comment:

Should have only one news file please remove NEWS (suggested removal) or NEWS.md (suggested keep)

Response: As you suggested, we have removed the NEWS file and kept the NEWS.md file.


inst/ Files

vignette_drafts/

Comment:

Does not seem like this should be included here and exposed to users. Please remove. Move any relevant vignettes to the vignette directory

Response: This directory has been removed.

vignette_docx/

Comment:

Does not seem like this should be included here and exposed to users. Please remove. Move any relevant vignettes to the vignette directory

Response: This directory has been removed.

misc_scripts/

Comment:

What are these scripts? Again are these relevant that they should be exposed to the user. Else remove

Response: This directory has been removed.

extdata/

Comment:

Only include data files in extdata. Create a new inst/scripts directory to contain scripts for generating the data. These scripts (either R/sudo code/or paragraph) should indicate the source of the data and how to replicate creation.

Response: As you suggested, we have moved all *.R files relevant to data generation to inst/scripts.


Top-Level Directories

docs/

Comment:

What are the purpose of these directories? They probably should not be git tracked? These directories will be 'ignored' by R CMD build but still cloned over. The desired state is closer to only package relevant files on the master/git repository.

Response: To clarify, the pkgdown website-builder utility package builds our package website and stores the files for this website in the docs/ directory.

testing_files/

Comment:

What are the purpose of these directories? They probably should not be git tracked? These directories will be 'ignored' by R CMD build but still cloned over. The desired state is closer to only package relevant files on the master/git repository.

Response (from Gabriel Odom): To clarify, I changed the directory from tests/ to testing_files/ so that R wouldn't run my tests every time I checked the package. I have now removed this directory from the package as you suggested.


Vignette Comments

Comment:

Include Bioconductor installation instructions

Response: We have added Bioconductor installation instructions to the "Introduction to PathwayPCA" and "Supplement 1 - Quickstart Guide" vignettes.

Comment:

If you created your own class, you show have accessor methods for common slots of interest to avoid direct access. eg. [1] "aespcOut" "pathwayPCA" "list" and ovarian_aespcOut$pVals_df. Could perhaps benefit from an accessor method for pVals?

Response: This is an excellent suggestion. As you suggested, we have added getPathpVals() as an accessor function for the table of $p$-values returned in objects of class aespcOut and superpcOut. The vignettes "Introduction to PathwayPCA", "Supplement 1 - Quickstart Guide", and "Supplement 4 - Methods Walkthrough" now all make use of this new accessor function.

Comment:

You perform a modification

ovOutGather_df <-
ovarian_aespcOut$pVals_df %>%
mutate(score = -log(rawp)) %>%
slice(1:20) %>% 
select(-pathways, -n_tested, -rawp, -FDR_BH)

But it is unclear to me why? Also couldn't this be automated for the user instead of manually run?

Response: As you suggested, we have implemented a new function that automates this modification. The new accessor function getPathpVals() has functionality to select the most significant $k$ pathways, transform the $p$-values, and remove the pathways and n_tested columns.


Miscellaneous

Data Documentation

Comment:

In the man pages for colonSurv_df.Rd please include the source information for data where this subset was taken from.

Response: We have included a citation of GEO GSE17538 in the documentation for the colonSurv_df data object.

PDF Manual

Comment:

Remove this from the top level directory (git rm)

Response: The file pathwayPCA.pdf has been removed from the package and from git.