Closed HristoInouzhe closed 4 years ago
Hi @HristoInouzhe
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: optimalFlowData
Type: Package
Title: optimalFlowData
Version: 0.1.0
Author: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Maintainer: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Description: Data files used as examples and for testing of the software provided in the optimalFlow package.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Depends: R (>= 3.6.1)
Suggests: knitr, BiocStyle, rmarkdown
VignetteBuilder: knitr
biocViews: ExperimentData, PackageTypeData, ImmunoOncologyData, FlowCytometryData
AdditionalPackage: https://github.com/HristoInouzhe/optimalFlow
Can't build unless issue is open and '2. review in progress' label is present, or issue is closed and 'TESTING' label is present.
Dear @HristoInouzhe ,
You (or someone) has already posted that repository to our tracker.
See https://github.com/Bioconductor/Contributions/issues/1260
You cannot post the same repository more than once.
If you would like this repository to be linked to issue number: 1260, Please contact a Bioconductor Core Member. I am closing this issue.
You will receive a build report shortly. Please set up your webhook so that future version bumps on the package will trigger an automatic rebuild.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Now that the data package has built once - Could you please run the AdditionalPackage: https://github.com/HristoInouzhe/optimalFlow
line again - this will ensure it gets in our database correctly and a build report for that package will be generated also.
AdditionalPackage: https://github.com/HristoInouzhe/optimalFlow
Hi @HristoInouzhe,
Starting build on additional package https://github.com/HristoInouzhe/optimalFlow.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your additional package repository will NOT trigger a new build.
The DESCRIPTION file of this additional package is:
Package: optimalFlow
Type: Package
Title: optimalFlow
Version: 0.1.0
Author: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Maintainer: Hristo Inouzhe <hristo.inouzhe@gmail.com>
Description: Optimal-transport techniques applied to supervised flow cytometry gating.
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Depends:
dplyr,
optimalFlowData,
rlang (>= 0.4.0)
Imports:
transport,
parallel,
Rfast,
robustbase,
dbscan,
randomForest,
foreach,
graphics,
doParallel,
stats
Suggests: knitr, BiocStyle, ellipse, rmarkdown
VignetteBuilder: knitr
biocViews: Software, FlowCytometry, Technology
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Sorry, I missed to change the package version. Also I will change the R version dependance. I have placed the web-hooks so if I understand correctly that will trigger a new build, is that so?
Thanks a lot for the attention.
I mean when I do a new push, that will trigger the new build.
Yes if you have the webhooks in place (you'll need to do it for each package) - When you do a version bump in the description and commit and push up the change it will trigger a new build
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Thanks @HristoInouzhe for your submission!
optimalFlow and optimalFlowData will both need some significant improvements before we can add them to Bioconductor. See below for the details.
Please let me know or ask on the bioc-devel mailing list if you have any question or concern about this.
Thanks again.
Regards, H.
optimalFlow exports and documents 36 functions but it's not clear that all of them are relevant for the end user (e.g. w2dist()
and voteLabelTransfer()
), that is, they don't seem to be used in a typical workflow and it's not clear by looking at their man page when they should be used. Please make sure to only export and document functions that are intended to be called by the end user. In particular internal helpers (i.e. functions used internally by your package) don't need to be exported and documented.
The flow cytometry datasets provided by the optimalFlowData package are stored as data frames with numeric columns. However the values in these columns are actually integer values so integer vectors should be used instead of numeric vectors. Also a factor rather than a character vector should be used for the last column. These 2 changes not only reflect the nature of the data but they will also reduce the memory footprint of the datasets by about half. Generally speaking, developpers should carefully pick up the best data structure for the data they need to represent.
There is no real benefit in storing these datasets as tibbles instead of standard data frames. Besides, tibble objects are implemented in the tibble package so it doesn't make sense to use them in optimalFlowData without having optimalFlowData depend on tibble. Generally speaking it's better to stick to base R data structures unless there is a good reason for using something more specialized. An immediate advantage of sticking to base R data structures is to reduce dependencies.
Long statements with many repetitions like the one you use in the vignette to build the input of optimalFlowTemplates()
(database
object) should be avoided. You're basically applying manually the same transformation to 15 datasets so the right thing to do here is to use a loop. For example:
buildDatabase <- function(dataset_names, population_ids) {
envir <- as.environment("package:optimalFlowData")
datasets <- mget(dataset_names, envir=envir)
lapply(datasets,
function(dset)
dset[dset$`Population ID (name)` %in% population_ids, ])
}
database <- buildDatabase(
dataset_names=paste0("Cytometry", c(2:5, 7:9, 12:17, 19, 21)),
population_ids=c("Monocytes", "CD4+CD8-", "Mature SIg Kappa", "TCRgd-"))
You use the same long statement in the example section of various man pages to build the input of optimalFlowTemplates()
. One important principle in software engineering is the DRY (Don't Repeat Yourself) principle (note that this principle applies not only to code repetition but also to redundancies in the documentation). To avoid the repetition, you could define buildDatabase()
in the optimalFlowData package and use it in the vignette and man pages to build the input of optimalFlowTemplates()
.
Please explain in the vignette and man pages why you exclude some datasets when building the input of optimalFlowTemplates()
(e.g. Cytometry1
, Cytometry6
, Cytometry10
, etc... are excluded). Also please explain why you restrict the datasets to some population IDs only.
Before calling optimalFlowTemplates()
in the vignette you say: "In this case we are looking for 5 distinct groups of cytometries". Please explain. In particular, how would the user know this when she's working with her own data?
Error messages tend to be obscure and not helpful:
database <- buildDatabase(dataset_names=paste0("Cytometry", 18:21),
population_ids=c("TCRgd-", "CD56dim"))
templates.optimalFlow <- optimalFlowTemplates(database, templates.number=5)
# Error in cutree(cytos.hcl, k = templates.number) :
# elements of 'k' must be between 1 and 4
Some warning messages should not be ignored as they probably reveal some problem with the code:
templates.optimalFlow <- optimalFlowTemplates(database, cov.estimation="robust", templates.number=3)
# [1] "step 1: 1.67360424995422 secs"
# [1] "step 2: 0.219858884811401 secs"
# [1] "Execution time: 1.89381980895996 secs"
# Warning messages:
# 1: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 2: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 3: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 4: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 5: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 6: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 7: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
# 8: In if (is.na(rob_est)) { :
# the condition has length > 1 and only the first element will be used
In ?optimalFlowTemplates
the Value
section says:
templates: A list of the consensus clusterings for every group in the
partition of the database.
clustering: Clustering of the input partitions.
database.elliptical: Means, covariances and weights of the clusters in
the input partitions.
You need to provide more details.
First of all you should make it clear that the function returns a list (your users should not have to guess). This is typically done by starting the Value
section with something like A list with components:
. For example:
A list with components:
templates: A list of the consensus clusterings for every group in the
partition of the database.
clustering: Clustering of the input partitions.
database.elliptical: Means, covariances and weights of the clusters in
the input partitions.
Note that proper formatting can be achieved by using \itemize
in the man page source e.g.:
A list with components:
\itemize{
\item{templates:}{...}
\item{clustering:}{...}
\item{database.elliptical:}{...}
}
You also need to provide a lot more details about each component. In particular the user needs to know how the data is organized within each component. For example the database.elliptical
component is itself a list with 2 more nested levels of lists! And it seems that the leaves of this tree-like structure can be numeric or character vectors or matrices. How could the user possibly make any use of all this data without knowing what it is and how it's organized? Please understand that when in your vignette you do things like
templates.optimalFlow$database.elliptical[[3]][[1]]$cov[c(4,3),][,c(4,3)]
or
templates.optimalFlow$database.elliptical[[3]][[1]]$mean[c(4,3)]
you extract some values from the database.elliptical
component but the user has no way to know what you are extarcting exactly and why you are extracting it. In other words, and to put it bluntly, this is all very obscure business from an end-user point of view. It should not be.
Related to the above issue: All the plotting code in the vignette seems very ad-hoc i.e. it seems to have been fine-tuned to work with the data at hand. What's going to happen when the user tries to use this on her own data? Is this code going to work as-is? Probably not. So the ad-hoc code produces nice plots for the vignette but won't that useful when the user will need to plot her own data. Please provide plotting tools that are re-usable, ideally by defining plotting functions that take the output of optimalFlowTemplates()
as input.
About doing things like cov[c(4,3),][,c(4,3)]
. Why would you subset first by row, then by column when you can do both at once with cov[c(4,3), c(4,3)]
(or even better with cov[4:3, 4:3]
)?
Please use a left-arrow (<-
) instead of =
for assignments everywhere in your code (including in your man page examples and vignette). Make sure to put a space before and after the left-arrow e.g. x <- 4:3
instead of x<-4:3
.
Always put a space after the comma (,
) to clearly separate arguments in your function calls. Note that this also applies to the Usage
section of your man pages (see for example the Usage
section in ?optimalFlowTemplates
where the lack of space after the comma hurts readibility e.g. alpha.cov = 0.85,equal.weights.template = TRUE
).
Use proper indentation (again, this also applies to the Usage
section of your man pages). For example the Usage
section in ?optimalFlowTemplates
is hard to read because the additional lines of the multi-line expression are not indented:
optimalFlowTemplates(database, database.names = NULL, cov.estimation = "standard",
alpha.cov = 0.85,equal.weights.template = TRUE, hclust.method = "complete",
templates.number = NA,minPts = 2, eps = 1, consensus.method = "pooling",
barycenters.number = 37,bar.repetitions = 40, alpha.bar = 0.05,
bar.ini.method = "plus-plus", consensus.minPts = 3, cl.paral = 1)
Same thing in your examples:
templates.optimalFlow = optimalFlowTemplates(database = database, templates.number = 5,
cl.paral = 1)
Hi @HristoInouzhe , do you intend to follow up on this submission? Thanks! H.
We are closing this issue for inactivity. Please comment back here when you are ready to move forward with the review and have the updated package ready for re-review.
This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.
Thank you for interest in Bioconductor.
Hi. I have done a complete revision of the package optimalFlow including the sugestions and adressing the problems indicated by the reviewer. I have the new version on github. I would like to reopen this issue for a reconsideration for BioConductor. Thanks a lot for the interest and the hard work put into revision and management.
Hi @HristoInouzhe,
Please bump the version of the package in order to trigger a new build. Thanks!
H.
Received a valid push; starting a build. Commits are:
14df9db bumping to version 0.99.1
Received a valid push; starting a build. Commits are:
efff194 bumping to version 0.99.1
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
dde8f45 bump in version
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
938d309 version bump
Received a valid push; starting a build. Commits are:
b4a9018 version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
2b1f02f eliminate dependancy and version bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
784200e dependency and bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
890d67f R version 4.0 and bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi, I'm getting a problem that I've not been able to solve. I get the folowing Warning when BiocCheck is applied:
If I remove the R dependancy in the Description, R CMD Build adds automatically a dependancy on R (>= 3.5.0) and I get the same Warning in BiocCheck.
I'm a bit confused, what should I do?
Thanks a lot for the attention.
If you contain data than ignore this warning.
Received a valid push; starting a build. Commits are:
e13a7b5 bump in version
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Hi again.
The build report gives an error on optimalFlow when running R CMD CHECK un Ubuntu with R 4.0.0:
base::assign(".ptime", proc.time(), pos = "CheckExEnv")
Name: labelTransfer
Title: labelTransfer
Aliases: labelTransfer
** Examples
data.example <- data.frame(v1 = c(rnorm(50, 2, 1), rnorm(50, -2, 1)),
- v2 = c(rnorm(50, 2, 1), rnorm(50, -2, 1)), id = c(rep(0, 50), rep(1, 50))) test.labels <- c(rep('a', 50), rep('b', 50)) labelTransfer(data.example, data.example[, 1:2], test.labels) free(): invalid pointer Aborted (core dumped)
However, I'm not able to obtain this error on my local machines, neither on Ubuntu 18.10 with R 3.6.1 or in Windows 10 and R 3.6.3. In both, when I do R CMD Check I get 0 ERRORS and 0 WARNINGS.
When I run the example alone it also doesn't give any errors.
Can I get more info about the error, or some guidance for solving the issue?
Thanks a lot and I'm sorry for the inconvenience.
Received a valid push; starting a build. Commits are:
20b573d bump
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
e012658 version bump after rnorm
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
d3daa89 bump after base
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[ x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[ x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x ] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[ x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[ x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.