Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

optimalFlow #1260

Closed HristoInouzhe closed 4 years ago

HristoInouzhe commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 5 years ago

Hi @HristoInouzhe

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: optimalFlowData
Type: Package
Title: optimalFlowData
Version: 0.1.0
Author: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Maintainer: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Description: Data files used as examples and for testing of the software provided in the optimalFlow package.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Depends: R (>= 3.6.1)
Suggests: knitr, BiocStyle, rmarkdown
VignetteBuilder: knitr
biocViews: ExperimentData, PackageTypeData, ImmunoOncologyData, FlowCytometryData
HristoInouzhe commented 5 years ago

AdditionalPackage: https://github.com/HristoInouzhe/optimalFlow

bioc-issue-bot commented 5 years ago

Can't build unless issue is open and '2. review in progress' label is present, or issue is closed and 'TESTING' label is present.

bioc-issue-bot commented 5 years ago

Dear @HristoInouzhe ,

You (or someone) has already posted that repository to our tracker.

See https://github.com/Bioconductor/Contributions/issues/1260

You cannot post the same repository more than once.

If you would like this repository to be linked to issue number: 1260, Please contact a Bioconductor Core Member. I am closing this issue.

lshep commented 5 years ago

You will receive a build report shortly. Please set up your webhook so that future version bumps on the package will trigger an automatic rebuild.

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

lshep commented 5 years ago

Now that the data package has built once - Could you please run the AdditionalPackage: https://github.com/HristoInouzhe/optimalFlow line again - this will ensure it gets in our database correctly and a build report for that package will be generated also.

HristoInouzhe commented 5 years ago

AdditionalPackage: https://github.com/HristoInouzhe/optimalFlow

bioc-issue-bot commented 5 years ago

Hi @HristoInouzhe,

Starting build on additional package https://github.com/HristoInouzhe/optimalFlow.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your additional package repository will NOT trigger a new build.

The DESCRIPTION file of this additional package is:

Package: optimalFlow
Type: Package
Title: optimalFlow
Version: 0.1.0
Author: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Maintainer: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Description: Optimal-transport techniques applied to supervised flow cytometry gating.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Depends:
  dplyr,
  optimalFlowData,
  rlang (>= 0.4.0)
Imports:
  transport,
  parallel,
  Rfast,
  robustbase,
  dbscan,
  randomForest,
  foreach,
  graphics,
  doParallel,
  stats
Suggests: knitr, BiocStyle, ellipse, rmarkdown
VignetteBuilder: knitr
biocViews: Software, FlowCytometry, Technology
bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

HristoInouzhe commented 5 years ago

Sorry, I missed to change the package version. Also I will change the R version dependance. I have placed the web-hooks so if I understand correctly that will trigger a new build, is that so?

Thanks a lot for the attention.

HristoInouzhe commented 5 years ago

I mean when I do a new push, that will trigger the new build.

lshep commented 5 years ago

Yes if you have the webhooks in place (you'll need to do it for each package) - When you do a version bump in the description and commit and push up the change it will trigger a new build

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

4e8abf0 18.09.19 version correction c491761 Merge pull request #5 from HristoInouzhe/provision...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

bioc-issue-bot commented 5 years ago

Received a valid push; starting a build. Commits are:

fd4e42c 18.09.19 version correction 6b74d23 Merge pull request #7 from HristoInouzhe/provision...

bioc-issue-bot commented 5 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

hpages commented 4 years ago

Thanks @HristoInouzhe for your submission!

optimalFlow and optimalFlowData will both need some significant improvements before we can add them to Bioconductor. See below for the details.

Please let me know or ask on the bioc-devel mailing list if you have any question or concern about this.

Thanks again.

Regards, H.

  1. optimalFlow exports and documents 36 functions but it's not clear that all of them are relevant for the end user (e.g. w2dist() and voteLabelTransfer()), that is, they don't seem to be used in a typical workflow and it's not clear by looking at their man page when they should be used. Please make sure to only export and document functions that are intended to be called by the end user. In particular internal helpers (i.e. functions used internally by your package) don't need to be exported and documented.

  2. The flow cytometry datasets provided by the optimalFlowData package are stored as data frames with numeric columns. However the values in these columns are actually integer values so integer vectors should be used instead of numeric vectors. Also a factor rather than a character vector should be used for the last column. These 2 changes not only reflect the nature of the data but they will also reduce the memory footprint of the datasets by about half. Generally speaking, developpers should carefully pick up the best data structure for the data they need to represent.

  3. There is no real benefit in storing these datasets as tibbles instead of standard data frames. Besides, tibble objects are implemented in the tibble package so it doesn't make sense to use them in optimalFlowData without having optimalFlowData depend on tibble. Generally speaking it's better to stick to base R data structures unless there is a good reason for using something more specialized. An immediate advantage of sticking to base R data structures is to reduce dependencies.

  4. Long statements with many repetitions like the one you use in the vignette to build the input of optimalFlowTemplates() (database object) should be avoided. You're basically applying manually the same transformation to 15 datasets so the right thing to do here is to use a loop. For example:

    buildDatabase <- function(dataset_names, population_ids) {
        envir <- as.environment("package:optimalFlowData")
        datasets <- mget(dataset_names, envir=envir)
        lapply(datasets,
            function(dset)
                dset[dset$`Population ID (name)` %in% population_ids, ])
    }
    
    database <- buildDatabase(
        dataset_names=paste0("Cytometry", c(2:5, 7:9, 12:17, 19, 21)),
        population_ids=c("Monocytes", "CD4+CD8-", "Mature SIg Kappa", "TCRgd-"))
  5. You use the same long statement in the example section of various man pages to build the input of optimalFlowTemplates(). One important principle in software engineering is the DRY (Don't Repeat Yourself) principle (note that this principle applies not only to code repetition but also to redundancies in the documentation). To avoid the repetition, you could define buildDatabase() in the optimalFlowData package and use it in the vignette and man pages to build the input of optimalFlowTemplates().

  6. Please explain in the vignette and man pages why you exclude some datasets when building the input of optimalFlowTemplates() (e.g. Cytometry1, Cytometry6, Cytometry10, etc... are excluded). Also please explain why you restrict the datasets to some population IDs only.

  7. Before calling optimalFlowTemplates() in the vignette you say: "In this case we are looking for 5 distinct groups of cytometries". Please explain. In particular, how would the user know this when she's working with her own data?

  8. Error messages tend to be obscure and not helpful:

    database <- buildDatabase(dataset_names=paste0("Cytometry", 18:21),
                              population_ids=c("TCRgd-", "CD56dim"))
    templates.optimalFlow <- optimalFlowTemplates(database, templates.number=5)
    # Error in cutree(cytos.hcl, k = templates.number) : 
    #   elements of 'k' must be between 1 and 4
  9. Some warning messages should not be ignored as they probably reveal some problem with the code:

    templates.optimalFlow <- optimalFlowTemplates(database, cov.estimation="robust", templates.number=3)
    # [1] "step 1: 1.67360424995422 secs"
    # [1] "step 2: 0.219858884811401 secs"
    # [1] "Execution time: 1.89381980895996 secs"
    # Warning messages:
    # 1: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 2: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 3: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 4: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 5: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 6: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 7: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
    # 8: In if (is.na(rob_est)) { :
    #   the condition has length > 1 and only the first element will be used
  10. In ?optimalFlowTemplates the Value section says:

    templates: A list of the consensus clusterings for every group in the
              partition of the database.
    
    clustering: Clustering of the input partitions.
    
    database.elliptical: Means, covariances and weights of the clusters in
              the input partitions.

    You need to provide more details.

    First of all you should make it clear that the function returns a list (your users should not have to guess). This is typically done by starting the Value section with something like A list with components:. For example:

    A list with components:
    
      templates: A list of the consensus clusterings for every group in the
                 partition of the database.
    
      clustering: Clustering of the input partitions.
    
      database.elliptical: Means, covariances and weights of the clusters in
                the input partitions.

    Note that proper formatting can be achieved by using \itemize in the man page source e.g.:

    A list with components:
    \itemize{
        \item{templates:}{...}
        \item{clustering:}{...}
        \item{database.elliptical:}{...}
    }

    You also need to provide a lot more details about each component. In particular the user needs to know how the data is organized within each component. For example the database.elliptical component is itself a list with 2 more nested levels of lists! And it seems that the leaves of this tree-like structure can be numeric or character vectors or matrices. How could the user possibly make any use of all this data without knowing what it is and how it's organized? Please understand that when in your vignette you do things like

    templates.optimalFlow$database.elliptical[[3]][[1]]$cov[c(4,3),][,c(4,3)]

    or

    templates.optimalFlow$database.elliptical[[3]][[1]]$mean[c(4,3)]

    you extract some values from the database.elliptical component but the user has no way to know what you are extarcting exactly and why you are extracting it. In other words, and to put it bluntly, this is all very obscure business from an end-user point of view. It should not be.

  11. Related to the above issue: All the plotting code in the vignette seems very ad-hoc i.e. it seems to have been fine-tuned to work with the data at hand. What's going to happen when the user tries to use this on her own data? Is this code going to work as-is? Probably not. So the ad-hoc code produces nice plots for the vignette but won't that useful when the user will need to plot her own data. Please provide plotting tools that are re-usable, ideally by defining plotting functions that take the output of optimalFlowTemplates() as input.

  12. About doing things like cov[c(4,3),][,c(4,3)]. Why would you subset first by row, then by column when you can do both at once with cov[c(4,3), c(4,3)] (or even better with cov[4:3, 4:3])?

  13. Please use a left-arrow (<-) instead of = for assignments everywhere in your code (including in your man page examples and vignette). Make sure to put a space before and after the left-arrow e.g. x <- 4:3 instead of x<-4:3.

  14. Always put a space after the comma (,) to clearly separate arguments in your function calls. Note that this also applies to the Usage section of your man pages (see for example the Usage section in ?optimalFlowTemplates where the lack of space after the comma hurts readibility e.g. alpha.cov = 0.85,equal.weights.template = TRUE).

  15. Use proper indentation (again, this also applies to the Usage section of your man pages). For example the Usage section in ?optimalFlowTemplates is hard to read because the additional lines of the multi-line expression are not indented:

    optimalFlowTemplates(database, database.names = NULL, cov.estimation = "standard",
    alpha.cov = 0.85,equal.weights.template = TRUE, hclust.method = "complete",
    templates.number = NA,minPts = 2, eps = 1, consensus.method = "pooling",
    barycenters.number = 37,bar.repetitions = 40, alpha.bar = 0.05,
    bar.ini.method = "plus-plus", consensus.minPts = 3, cl.paral = 1)

    Same thing in your examples:

    templates.optimalFlow = optimalFlowTemplates(database = database, templates.number = 5,
    cl.paral = 1)
hpages commented 4 years ago

Hi @HristoInouzhe , do you intend to follow up on this submission? Thanks! H.

HristoInouzhe commented 4 years ago
Yes, I intend to follow through the package. To be honest I just saw the first email with the proposed corrections. It got lost in the crazy ammount of daily emails.  I will try to address all the corrections in the shortes amount of time.  Thanks a lot for the attention and the succint improvements suggestions. Best regards, Hristo. Enviado desde Correo para Windows 10 De: Hervé PagèsEnviado: 19 December 2019 12:39Para: Bioconductor/ContributionsCC: HristoInouzhe; MentionAsunto: Re: [Bioconductor/Contributions] optimalFlow (#1260) Hi @HristoInouzhe , do you intend to follow up on this submission? Thanks! H.—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe. 
lshep commented 4 years ago

We are closing this issue for inactivity. Please comment back here when you are ready to move forward with the review and have the updated package ready for re-review.

bioc-issue-bot commented 4 years ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for interest in Bioconductor.

HristoInouzhe commented 4 years ago

Hi. I have done a complete revision of the package optimalFlow including the sugestions and adressing the problems indicated by the reviewer. I have the new version on github. I would like to reopen this issue for a reconsideration for BioConductor. Thanks a lot for the interest and the hard work put into revision and management.

hpages commented 4 years ago

Hi @HristoInouzhe,

Please bump the version of the package in order to trigger a new build. Thanks!

H.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

14df9db bumping to version 0.99.1

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

efff194 bumping to version 0.99.1

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

dde8f45 bump in version

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

938d309 version bump

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

b4a9018 version bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

2b1f02f eliminate dependancy and version bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

784200e dependency and bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

890d67f R version 4.0 and bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

HristoInouzhe commented 4 years ago

Hi, I'm getting a problem that I've not been able to solve. I get the folowing Warning when BiocCheck is applied:

If I remove the R dependancy in the Description, R CMD Build adds automatically a dependancy on R (>= 3.5.0) and I get the same Warning in BiocCheck.

I'm a bit confused, what should I do?

Thanks a lot for the attention.

lshep commented 4 years ago

If you contain data than ignore this warning.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

e13a7b5 bump in version

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

HristoInouzhe commented 4 years ago

Hi again.

The build report gives an error on optimalFlow when running R CMD CHECK un Ubuntu with R 4.0.0:

base::assign(".ptime", proc.time(), pos = "CheckExEnv")

Name: labelTransfer

Title: labelTransfer

Aliases: labelTransfer

** Examples

data.example <- data.frame(v1 = c(rnorm(50, 2, 1), rnorm(50, -2, 1)),

  • v2 = c(rnorm(50, 2, 1), rnorm(50, -2, 1)), id = c(rep(0, 50), rep(1, 50))) test.labels <- c(rep('a', 50), rep('b', 50)) labelTransfer(data.example, data.example[, 1:2], test.labels) free(): invalid pointer Aborted (core dumped)

However, I'm not able to obtain this error on my local machines, neither on Ubuntu 18.10 with R 3.6.1 or in Windows 10 and R 3.6.3. In both, when I do R CMD Check I get 0 ERRORS and 0 WARNINGS.

When I run the example alone it also doesn't give any errors.

Can I get more info about the error, or some guidance for solving the issue?

Thanks a lot and I'm sorry for the inconvenience.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

20b573d bump

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

e012658 version bump after rnorm

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

d3daa89 bump after base