Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

PDATK #1881

Closed ChristopherEeles closed 3 years ago

ChristopherEeles commented 3 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 3 years ago

Hi @ChristopherEeles

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: PDATK
Type: Package
Title: Pancreatic Ductal Adenocarinoma Tool-Kit
Version: 0.99.0
Date: 2021-02-05
Authors@R: c(
    person('Vandana', 'Sandhu', role=c('aut')),
    person('Heewon', 'Seo', role=c('aut')),
    person('Christopher', 'Eeles', role=c('aut')),
    person('Benjamin', 'Haibe-Kains', role=c('aut', 'cre'),
        email="benjamin.haibe.kains@utoronto.ca"))
Description: A unique set of 89 pancreatic cancer samples profiled using both 
  sequencing and microarray platform was used for training the PCOSP 
  (Pancreatic cancer overall survival predictor) to predict patients with 
  early death (within >1 yr) after surgery. Further, to validate the model we 
  used genomic profiles of 823 samples curated from public domain. The
  simplistic k-top scoring pair approach was used to build the model, where we 
  looked at relative expression of gene pairs within the patient.
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R (>= 4.1), SummarizedExperiment
Imports:
    data.table,
    CoreGx,
    methods,
    S4Vectors, 
    BiocGenerics,
    survival,
    stats,
    plyr,
    BiocParallel,
    rlang,
    piano,
    scales,
    survcomp,
    genefu,
    ggplot2,
    switchBox,
    caret,
    reportROC,
    pROC,
    verification
Suggests:
    testthat (>= 3.0.0),
    msigdbr,
    BiocStyle,
    rmarkdown,
    knitr
VignetteBuilder: knitr
biocViews: GeneExpression, Pharmacogenetics, Pharmacogenomics, Software,
    Classification,  Survival
BugReports: https://github.com/bhklab/PDATK/issues
RoxygenNote: 7.1.1
Collate: 
    'class-CohortList.R'
    'class-SurvivalExperiment.R'
    'class-SurvivalModel.R'
    'class-ClinicalModel.R'
    'class-GeneFuModel.R'
    'class-ModelComparison.R'
    'class-PCOSP.R'
    'class-RandomGeneAssignmentModel.R'
    'class-RandomLabelShufflingModel.R'
    'classUnions.R'
    'data.R'
    'globals.R'
    'methods-assignSubtypes.R'
    'methods-barPlotModelComparison.R'
    'methods-compareModels.R'
    'methods-densityPlotModelComparison.R'
    'methods-dropNotCensored.R'
    'methods-findCommonGenes.R'
    'methods-findCommonSamples.R'
    'methods-forestPlot.R'
    'methods-getTopFeatures.R'
    'methods-merge.R'
    'methods-plotROC.R'
    'methods-predictClasses.R'
    'methods-runGSEA.R'
    'methods-subset.R'
    'methods-trainModel.R'
    'methods-validateModel.R'
    'utilities.R'
bioc-issue-bot commented 3 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Christopher, @ChristopherEeles

Thank you for your submission. Please see the review below. Feel free to ask any questions.

Best regards, Marcel


PDATK #1881

DESCRIPTION

NAMESPACE

vignettes

Minor:

R

tests

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

0b6f115 Version bump for Bioc build

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

80511ef Version bump for Bioc build

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

476c1d0 Add: unit tests for SurvivalExperiment class; fixe... dd8eab7 Add: units tests for SurvivalExperiment coercion a... 056c1f7 Add: unit tests for SurvivalModel accessor methdos 6036d7a Fix: error in .generateTSPModels due incorret subs... 1f8a709 Add: Unit tests for modelling workflow methods for... 4d8bf1d Add: unit tests for RLSModel constructor a9e7bf5 Fix: wrong argument order in predictClasses call i... ea68f61 Fix: Resolved error introduced by adding model cla... 67a68de Fix: Error in PCOSP vignette yaml header; Issue wh... a250657 Add: unit tests for RGAModel constructor f638c33 Add: unit tests for ModelComparison constructor an... fe82027 Add: unit tests for ClinicalModel constructor 2b49af4 Add: unit test for creating empty SurvivalExperime... 07e4c3e Add: tests for automatic column type coercion in S... 264fe55 Fix: Errors in latest SurvivalExperiment construct... 40c4221 Add: final check in SurvivalExperiment constructor... 5aed30d Fix: Error in predictClasses for ClinicalModel due... b31ac1f Add: unit tests for trainModel, predictClasses and... a40dbe8 Add: unit tests for GeneFuModel constructor" 932c557 Add: trainModel method for GeneFuModel class which... b694e48 Add: unit tests for GeneFuModel trainModel, predic... 1ba0ac8 Update: modified validateModel method for PCOSP cl... 055914d Fix: Added additional logic in .getExectionContext... 42cc32d Add: validateModel now calcualtes per subtype stat... 64873c7 Fix: Bug in validateModel,PCOSP_or_RLS_or_RGA-meth... 484cc48 Update: validateModel when model has subtypes work... c16f7d4 Fix: resaved all package data with xz compression e80beb6 Merge with master a64751a Fix: missed a conflict in merge with master 2157819 Update: Debugging build and version bump for Bioc ... 55396de Development version passing tests and checks after... ced9803 Update: passing BiocCheck and ready to merge with ... 628bd89 Merge branch 'master' into development 39e0ea6 Update: master passing BiocCheck after merge with ...

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ChristopherEeles commented 3 years ago

Hi @LiNk-NY,

I am having a hard time fixing the timeout issues on Windows Server. The problem arises because I have to redo model training, prediction and validation for any of the downstream functions like plotting or model comparison. I am already pushing package size limits so I can't just save a pre-validated model to use for those examples. I have done some searching to see if it is possible to cache the results of one example to be loaded in another, but haven't found anything about this. Do you have any recommendations?

NAMESPACE

Is there a reason for exporting class unions (e.g.,
PCOSP_or_ClinicalModel)? These should be internal to the
package for easier method dispatch.
- This is fixed

vignettes

Include an Installation section in the vignette
- Done

Include the PCOSP paper reference.
- Done

It may be better to use the assays argument instead of a sumExp one in
the constructor function of SurvivalExperiment for ease-of-use and simplicity
you can test for class membership inside your constructor function.

- I am not sure what you mean by this. I originally tried just passing an existing SummarizedExperiment through to the dots
  of the SummarizedExperiment constructor, but found that this stripped off existing metadata and thus didn't achieve 
  the result I was looking for (which is to allow users to easily convert a SummarizedExperiment to a SurvivalExperiment.

Add references to both vignettes
- Done

Minor:

Use grep with value = TRUE instead of x[grepl(..., x)]
- I can update this but I am curious why it's important?

R

We do not recommend setting a randomSeed inside functions. This should
be done outside the function by the user (in trainModel and others).
See: http://bioconductor.org/developers/package-guidelines/#rcode
- I am aware of this guideline, but couldn't think of another way to achieve the goal of a completely reproducible 
  self-contained model object without setting the seed internally. I originally wanted to capture the existing seed, 
  but as far as I can tell it isn't possible to reconstruct the original set.seed call from .Random.seed. The issue with 
  setting the seed outside of the method is that I then have no way to record all of the model configuration 
  parameters needed to reproduce a specific run. Setting the seed outside of the method also has the drawback of depending   
  on the number of ticks to the random seed since it was set, so the samples will depend not only on the seed, but also 
  how many random  samples were taken before the function was called.
- I guess my question is, is this a HARD rule? I agree that in most cases it is not a good idea to set.seed inside of a function 
  call, but in this case it is a feature to ensure that a model produces the same results regardless of where it is called. 
  We actually had an issue with reproducing the original PCOSP publication in the package because of that exact issue.
- If we can compromise, maybe I can set the default random seed to NA, so no seed it set, then if a user specifies a 
  randomSeed when building the model object I can throw a warning that I modified the seed locally inside the fuction call?

Avoid using inprecise comparisons when merging data such as colData in
!all(colDTL[[1]] == colDTL[[2]], na.rm=TRUE) if there are NA values in
different cells the two datasets may still pass this test even though they are
not equal. I would suggest merging using a key identifier and with
all = TRUE.

tests

We highly recommend adding tests to the package for maintaining robustness
of the code.
- I added tests; ~50% coverage, ~80% if you include the examples in the covr run; I will add more but am working on a 
   number of project right now.
LiNk-NY commented 3 years ago

Hi Christopher, @ChristopherEeles

I am having a hard time fixing the timeout issues on Windows Server. The problem arises because I have to redo model training, prediction and validation for any of the downstream functions like plotting or model comparison. I am already pushing package size limits so I can't just save a pre-validated model to use for those examples. I have done some searching to see if it is possible to cache the results of one example to be loaded in another, but haven't found anything about this. Do you have any recommendations?

My recommendation would be to reduce the size of the data that you have in the package. Your examples should demonstrate the package functionality rather than a real-world workflow. Large computations are often the cause of timeouts. By reducing the size of the data, you'll lower the timings. Windows takes longer because of the two architectures.

It may be better to use the assays argument instead of a sumExp one in the constructor function of SurvivalExperiment for ease-of-use and simplicity you can test for class membership inside your constructor function.

  • I am not sure what you mean by this. I originally tried just passing an existing SummarizedExperiment through to the dots of the SummarizedExperiment constructor, but found that this stripped off existing metadata and thus didn't achieve the result I was looking for (which is to allow users to easily convert a SummarizedExperiment to a SurvivalExperiment.

Your constructor function should be able to find and recognize SummarizedExperiment inputs in the ... argument. Take the list and go through it with:

inputs <- list(...)
is(inputs[[1]], "SummarizedExperiment")

Add references to both vignettes

  • Done

Minor:

Use grep with value = TRUE instead of x[grepl(..., x)]
- I can update this but I am curious why it's important?

This is a minor point but in general, it's good to reduce the number of operations for readability and performance.

R


We do not recommend setting a randomSeed inside functions. This should
be done outside the function by the user (in trainModel and others).
See: http://bioconductor.org/developers/package-guidelines/#rcode
- I am aware of this guideline, but couldn't think of another way to achieve the goal of a completely reproducible 
  self-contained model object without setting the seed internally. I originally wanted to capture the existing seed, 
  but as far as I can tell it isn't possible to reconstruct the original set.seed call from .Random.seed. The issue with 
  setting the seed outside of the method is that I then have no way to record all of the model configuration 
  parameters needed to reproduce a specific run. Setting the seed outside of the method also has the drawback of depending   
  on the number of ticks to the random seed since it was set, so the samples will depend not only on the seed, but also 
  how many random  samples were taken before the function was called.
- I guess my question is, is this a HARD rule? I agree that in most cases it is not a good idea to set.seed inside of a function 
  call, but in this case it is a feature to ensure that a model produces the same results regardless of where it is called. 
  We actually had an issue with reproducing the original PCOSP publication in the package because of that exact issue.
- If we can compromise, maybe I can set the default random seed to NA, so no seed it set, then if a user specifies a 
  randomSeed when building the model object I can throw a warning that I modified the seed locally inside the fuction call?

The seed should be recorded by the user and not the function. You can record the seed without a call to set.seed inside the function and at the same time record it in the metadata. This will be up to the user to document.

set.seed(1)
trainModel(..., seed.used = 1) {
    metadata(object)$randomSeed <- seed.used
}

Best, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

085c8a4 Fix: removed all set.seed calls inside functions a... d1bcc9f Add: getModelSeed generic and method for SurvivalM... af71a7b Fix: unit tests working since randomSeed changes 0df0679 Update: passing checks 09a6a14 Fix: using coef instead of d.index from survcomp::... c70c21a Fix: removed stray print statemnts from testing co... c47d48c Fix: taking log of lower and upper for the log_D_i... 2b3c4de Fix: all tests passing again 3fcef95 Update: global rename is_deceased to event_occurre... 5f65ee4 Fix: examples and tests passing with new data and ... adc8661 Fix: Passing R CMD check with new SurvivalExperime... c7581d8 Merge branch 'master' into development ee5c678 Merge branch 'master' of github.com:bhklab/PDATK

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

2c70a6e FFix: Version bump for Bioc build

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

LiNk-NY commented 3 years ago

Hi Christopher, @ChristopherEeles Can you bump the version of the package and push? Thanks, Marcel

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

97d2cde Fix: Incorrect check of length(dots) in SurvivalEx... 80397aa Fix: Passing R CMD check; examples still too slow ... c6d48b2 Update: Version bump for Bioc build

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

1ae2785 Fix: Another version bump for Bioc?

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

e18d319 Update: version bump for Bioc build

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

961dffe Fix: Running examples with SerialParam speeds up ~...

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

e069abf Fix: changed all(..., na.rm=TRUE) to all.equal cal... 70c2bec Fix: Using SerialParam for tests on Windows for sp...

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ChristopherEeles commented 3 years ago

Hi Marcel @LiNk-NY,

I fixed my TIMEOUT issues. Turns out the example size was fine, but BiocParallel was using SnowParam with a SOCK cluster on Windows and that was slowing down my examples and tests ~8x. Any idea why the overhead is so high?

I register SerialParam before running the parallelized examples in the package now, but wondered if there is a more 'elegant' way to accomplish the same result, such as setting global settings for examples in a package?

The remaining warning on Windows seems like a false positive due to bad file names in the auto-generated docs:

Found the following significant warnings:
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/models<-,SurvivalModel,SimpleList-method.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/models<-.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationData<-,SurvivalModel,CohortList-method.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationData<-.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationStats<-,SurvivalModel,data.frame-method.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationStats<-.html': Invalid argument
See 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.Rcheck/00install.out' for details.

Regarding the rest of the review:

The seed should be recorded by the user and not the function. You can record the seed without a call to set.seed inside the function and at the same time record it in the metadata. This will be up to the user to document.

This is a minor point but in general, it's good to reduce the number of operations for readability and performance.

  • I can't find the call to x[grepl(..., x)] anymore. I deleted a bunch of unused utility code from a previous refactor, so I think this snippet was probably in there.

Your constructor function should be able to find and recognize SummarizedExperiment inputs in the ... argument. Take the list and go through it with:

  • This is fixed and I now check the first item of dots instead of adding the sumExp parameter.

I believe that should address all of your concerns. If you require further changes please let me know.

Thanks for you assistance.

Best, Chris

ChristopherEeles commented 3 years ago

Missed a review comment:

Avoid using inprecise comparisons when merging data such as colData in !all(colDTL[[1]] == colDTL[[2]], na.rm=TRUE) if there are NA values in different cells the two datasets may still pass this test even though they are not equal. I would suggest merging using a key identifier and with all = TRUE.

  • I fixed this by doing my comparison with all.equal instead
LiNk-NY commented 3 years ago

Hi Christopher, @ChristopherEeles

I fixed my TIMEOUT issues. Turns out the example size was fine, but BiocParallel was using SnowParam with a SOCK cluster on Windows and that was slowing down my examples and tests ~8x. Any idea why the overhead is so high?

I register SerialParam before running the parallelized examples in the package now, but wondered if there is a more 'elegant' way to accomplish the same result, such as setting global settings for examples in a package?

I would avoid running parallel computations in the examples. Please keep it to one core. The examples are not supposed to be resource intensive. They're meant to demonstrate the use of the function with a small dataset. Remember that the package will be checked along with a number of packages. If all other packages are using multiple CPUs, they will significantly strain the build system.

The remaining warning on Windows seems like a false positive due to bad file names in the auto-generated docs:

Found the following significant warnings:
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/models<-,SurvivalModel,SimpleList-method.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/models<-.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationData<-,SurvivalModel,CohortList-method.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationData<-.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationStats<-,SurvivalModel,data.frame-method.html': Invalid argument
  Rd warning: cannot open file 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.buildbin-libdir/00LOCK-PDATK/00new/PDATK/help/validationStats<-.html': Invalid argument
See 'C:/Users/pkgbuild/packagebuilder/workers/jobs/1881/PDATK_20210224192007/PDATK.Rcheck/00install.out' for details.

Yes, you can ignore those warnings on Windows. They're related to how the package is checked by the BBS / SPB.

BTW, I'm not sure how the package is building and checking without pushing to git.biocoductor.org. Can you update the package there? cc: @lshep

bioc-issue-bot commented 3 years ago

Received a valid push; starting a build. Commits are:

98b608c Update: version bump for Bioc build

bioc-issue-bot commented 3 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 98b608c8ff84fc711a05d69c8dcd7ea28f6cb533

ChristopherEeles commented 3 years ago

Hi Marcel @LiNk-NY,

I added the Bioconductor upstream and pushed there. I was also confused when I didn't have to push to Bioconductor for the build to start.

Let me know if you need anything further for the submission.

I also have a suggestion/request. For my parallel examples, I actually just used BiocParallel with the default settings, so I never explicitly parallelized any of the code, I just pass dots to my bplapply calls so the user can use BPPARAM. Given this fact, it seems that the default backend on Windows server is SnowParam with a SOCK cluster. Maybe it is worthwhile for you guys to register SerialParam as the back-end by default to prevent the issues you mentioned above?

Best, Chris

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/PDATK to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

ChristopherEeles commented 3 years ago

Hi Marcel @LiNk-NY,

I have responded to all review comments.

Please let me know if you need anything further for the submission.

Best, Chris

LiNk-NY commented 3 years ago

Hi Christopher, @ChristopherEeles

Sorry for the delay in getting back to you. Thanks for making those changes.

I added the Bioconductor upstream and pushed there. I was also confused when I didn't have to push to Bioconductor for the build to start.

Perhaps you had set up a webhook? We are phasing out this functionality and it should not work in the future.

I also have a suggestion/request. For my parallel examples, I actually just used BiocParallel with the default settings, so I never explicitly parallelized any of the code, I just pass dots to my bplapply calls so the user can use BPPARAM. Given this fact, it seems that the default backend on Windows server is SnowParam with a SOCK cluster. Maybe it is worthwhile for you guys to register SerialParam as the back-end by default to prevent the issues you mentioned above?

Perhaps @mtmorgan can comment on this but I would recommend that you make SerialParam default in your code to avoid greedy computations (which you have done already).

Best regards, Marcel

bioc-issue-bot commented 3 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

vjcitn commented 3 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/ChristopherEeles.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("PDATK"). The package 'landing page' will be created at

https://bioconductor.org/packages/PDATK

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.