Closed GhislainFievet closed 1 year ago
Hi @GhislainFievet
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: adverSCarial
Title: What the Package Does (One Line, Title Case)
Version: 0.99.0
Authors@R:
person("Ghislain", "FIEVET", , "ghislain.fievet@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-0337-7327"))
Description:
adverSCarial is an R Package designed for generating and analyzing the vulnerability of scRNA-seq
classifiers to adversarial attacks. The package is versatile and provides a format for integrating
any type of classifier. It offers functions for studying and generating two types of attacks,
min change attack and max change attack. The min change attack involves making a small modification
to the input to alter the classification. The max change attack involves making a large modification
to the input without changing its classification.
The package provides a comprehensive solution for evaluating the robustness of scRNA-seq classifiers
against adversarial attacks.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
biocViews: Software, SingleCell, Transcriptomics, Classification
Suggests: knitr, RUnit, BiocGenerics, TENxPBMCData
Imports: gtools, stringr, randomForest
VignetteBuilder: knitr
Author: Ghislain FIEVET <ghislain.fievet@gmail.com>
Maintainer: Ghislain FIEVET <ghislain.fievet@gmail.com>
Please look at the Title field in your DESCRIPTION. Revise and bump the version and push to your repo.
Your vignette is called "doc.Rmd"; please use a meaningful filename for your vignette -- eventually there may be multiple such. It includes shorttitle: "Short title for headers"
... do not use the template without reading through. Please run R CMD check on your package before bumping again and verify that there are no ERRORs.
Hi, thank you for your messages.
I modified what was asked and I ran the R CMD check command with 0 error and 0 warning.
The tool I submit is a parameter search, and building the vignette can take up to an hour. In my case I had to run the "ulimit -s unlimited" before the check to increase the available memory.
Thanks for this comment. We have to be able to check the package in a modest amount of time. THere is a longtests protocol. You should sharply constrain the search in your vignette, or precompute and use a saved run. RIght now I am seeing
--- re-building ‘adverSCarial.Rmd’ using knitr
Error: processing vignette 'adverSCarial.Rmd' failed with diagnostics:
C stack usage 7969824 is too close to the limit
--- failed re-building ‘adverSCarial.Rmd’
SUMMARY: processing the following file failed:
‘adverSCarial.Rmd’
Error: Vignette re-building failed.
Execution halted
on a machine that can build all submissions. Please check carefully and follow the contributor guidelines. Thanks
Thank you for the advices :-) I did constraint the search for a vignette, and precomputed the others. There is no memory limit issue anymore, and the r cmd check command runs in 4 minutes. I did update my github repo.
What is the next step? Do I have too handle the push to git@git.bioconductor.org:packages/
Best regards.
Ghislain
Until you pass precheck, the package is not on git.bioconductor.org yet. Once a reviewer is assigned then yes you would need to push there to trigger new builds. For now just pushing to your github and commenting back here for us to look at it again is sufficient
Ok, thank you!
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 5c78b7c36611c5e6b49a952e0991de34aeb9e45a
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 687c1b4af9b41e373b2556829e3c0fd5e5601924
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 3a28f0ae5d5f1d2096cee4e8a8c0304228abfad5
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 3887f60ea728f31b9700e7ee222398590cd9c629
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
The package is for perturbation analysis of classification by single cell RNA data.
The adverScarial.Rmd vignette uses a TENxPBMCData data set and it compiles slowly on my old laptop computer with 4 GB RAM. Please see Memory section of developer's guide. Vignettes should be able to run quickly even on 32-bit hardware (mine is 64-bit).
List creation is inefficient. For example
modifications <- list()
modifications[[1]] <- list("perc1")
modifications[[2]] <- list("perc99")
This should be done efficiently as modifications <- c("perc1", "perc99")
. Please read Pre-allocate and Fill It also shouldn't be a list because all of the elements have the same data type. A character vector is sufficient.
The code in the vignette overView_analysis.Rmd is not executed. The output is hard-coded. Please make all code chunks in this vignette executable. Same for advRandWalkMinChange.Rmd and adapt_classifier.Rmd.
Use camelCase
parameters and variabes, not snake_case
. Please read Variable Names.
Because there is more than one vignette document and it is not clear which order the vignettes should be read in a README.md file should be part of the repository to guide users how to read the guide. Please see README File for what to include.
The first vignette which the user should read needs to contain biological motivation at the beginning of it. Why should readers care about min change attack and max change attack? Is it only a theoretical argument or is it modelling some read-world problem about single cell RNA data?
The package needs interoperbility with existing Bioconductor classes. It currently only accepts base data structures such as matrix and data.frame as input. At a minimum it should accept SingleCellExperiment and DataFrame. Also, there is no input validitity checking by functions.
@param exprs a matrix or dataframe of numeric RNA expression
It is unclear what function MClassifier
does. The title of documentation is "Example cell type classifier for the pbmc3k dataset" but the parameters are exprs
, clusters
, target
. Why is this function described as working on only one data set? Couldn't it work on any clustered data set?
In many cases the parameter descriptions do not match their example values. For example, function minChangeOverview
has excl_genes = c()
and genes = c()
. The definitions of these parameters are "a list of genes to exclude from the analysis" and "a list of genes in case you want to limit the analysis on a subset of genes" and Examples section has genes <- c("CD4", "CD8A")
. Note that
> class(genes)
[1] "character"
Please don't use list
and character vector interchangeably. They are different types in R language.
All function documentation files lack a Details section. Most of them should have such a section with a paragraph explaining how a function works. For example, MClassifier
's documentation doesn't explain the classification algorithm used. Is it k-Nearest Neighbours? Is it Nearest Shrunken Centroids? The reader doesn't know how classification works and how limited it is.
for
loops are not to be used in Bioconductor packages. For example, for (cell_type in unique(clusters))
. Please refer to Vectorize of developer's guide. Please convent into vapply loops.
Don't use 1:something
to create sequences. For example, 1:round(1 / step_change_ratio)
. It should be rewritten as seq_len(step_change_ratio)
.
adverscarial.R is has 1197 lines and almost all of the functions in the package are inside of it. Please put one function into one file, unless there are helper function directly related to a main function. Please refer to Organise Functions into Files.
Thank you for the review :-) A lot of useful advices, I'm on it.
Received a valid push on git.bioconductor.org; starting a build for commit id: de76c4e2cc244fe2d80347f3981b7ff53291bc15
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: c4fabe468e76b2693ce78879e161810d676539f4
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 3b2b3cfeba724d968cade353e6599ac408d2b2d5
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: c8b0ed7e34e5dbe64c42f8726d6eadb2e005cd35
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: c5daf5c69778aa3b57e9ee0865057067c39c7cd1
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 227bc773db223c643ae77e6ea1b7ed77430e1234
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi,
I modified the package according to the review.
A few points:
modifications <- c("perc1", "perc99")
as suggested, I have used modifications <- list(c("perc1"), c("perc99"))
. This enables users to provide more advanced parameters, such as list(c("perc1"), c("full_row_fct", myFct))
. I have included detailed instructions on how to construct these parameters in the vignettes and documentation.DataFrame
data structure to all functions, but not SingleCellExperiment
. This is because classifiers use different parts of SingleCellExperiment objects, such as the raw or normalized data, ENTREZ or gene symbols. To avoid confusion, users should provide the input data matrix directly.Best regards.
Ghislain
Code still widely uses snake_case e.g. tests_grid
, gene_ind
, modif_ind
, exprs_temp
, row_results
, etc.
There still remain unnecessary for
loops. For example:
row_results <- c()
for (gene_ind in seq_len(length(genes)))
{
...
row_results <- c(row_results, paste(modifications[[modif_ind]], collapse = " "))
}
This is ideally suitable for vapply
. Also, seq_len(length(genes))
is unnecessary and could simply be seq_along(genes)
.
Similarly, cbind
at the end of a for
loop to incrementally grow a table is not efficient.
df_result <- data.frame(todel = unique(clusters))
for (modif_ind in seq_len(length(modifications)))
{
...
df_result <- cbind(df_result, attacks_length)
}
Consider how to use vapply
and do.call(cbind, listOfResultingDataFrames)
instead.
I didn't mean that the input should accept DataFrame
. DataFrame
is not used for gene expression data because the table will not have multiple variable types but they will all be a single numeric type. A few of your return values are data.frame
and that would be suited to DataFrame
for more compact display. For example #' @return data.frame results of the classification of all the grid combinations.
is a return type of plain data.frame
.
The reasoning about not including SingleCellExperiment
support is unclear. Which classifiers use raw, unnormalised data for classification? I don't think it makes sense to do that if there are batch effects or different numbers of RNA-seq reads per sample. The documentation is also vague: "#' @param classifier a classifier in the suitable format" what is "the suitable format" precisely? Please provide information in Details section. There's some information in the vignette but it is not clear:
A classifier function has to be formated as follow to be used with adverSCarial:
classifier = function(expr, clusters, target){ c("cell type", trust_value) }
What is trust_value
? Please define the output vector clearly.
matrixFromSCE
but this function could be used internally at the beginning of user-facing functions if the user specifies a SingleCellExperiment
object to them.Ok, thank you, I'll correct these points.
Regards.
Received a valid push on git.bioconductor.org; starting a build for commit id: 66ca8536198f9875ba4450e96bdf487beb00649c
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 3b4fc2cd640991f9a533b7268596ea2d729a2208
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 15ed90e2e746c662f69f3c524796f5ce32cd8736
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: ad5a48e5e1c9fc16821ce0a17c7b260e4488bbc9
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 1b5579958040240c8ad393271311e995225d2ce7
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: fc11a933c2bc9a4987b3d70b6a1eaf97649eee47
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/adverSCarial
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi, here are my corrections.
do.call(cbind , myList)
with lapplyrowResults
and exprsTemp
, I considered it as side effect, is it ok?minChangeOverView
, maxChangeOverview
, advGridMinChange
and advRandWalkMinChange
trust_value
(which I renamed score
)About the SingleCellExperiment
and the matrixFromSCE
:
I would like my tool to be versatile. Users should be able to easily adapt any classifier in the appropriate format.
The CHETAH wants as input a SCE, easily created from raw data, eventhough they use normalized data to avoid batch effects. It is easy to adapt the CHETAH classifier to make it take as input a raw data matrix.
input <- SingleCellExperiment(assays = list(counts = input_counts),
reducedDims = SimpleList(TSNE = input_tsne))
## Run CHETAH
input <- CHETAHclassifier(input = input, ref_cells = reference)
The scType classifier takes as input a scaled matrix.
es.max = sctype_score(scRNAseqData = pbmc[["RNA"]]@scale.data, scaled = TRUE,
gs = gs_list$gs_positive, gs2 = gs_list$gs_negative)
In every cases it is easy to provide a data matrix, and make the classification from it.
It is also more convenient for my other functions. Let's say I accept SCE as input for advMinChange
:
pbmc3k
from TENxPBMCData
comes with ENTREZ id as default, whereas I need gene symbols for the classifier.It seems a lot easier to handle matrix of data.
I made the matrixFromSCE
function for one specific situation, to show as an example in the vignette. But users should make their own functions to get the right matrix from the objects they use, SCE, Seurat, or raw data from the sequencing.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[ x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[ x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x ] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x ] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[ x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[ x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[ x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.