Closed zhengxwen closed 1 year ago
Hi @zhengxwen
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: SCArray.sat
Type: Package
Title: Large-scale single-cell RNA-seq data analysis using GDS files and
Seurat
Version: 0.99.0
Date: 2023-02-03
Depends: methods, SCArray (>= 1.7.8), SeuratObject (>= 4.0), Seurat (>= 4.0)
Imports: S4Vectors, utils, stats, BiocGenerics, gdsfmt, DelayedArray,
BiocSingular, SummarizedExperiment
Suggests: RUnit, knitr, markdown, rmarkdown
Authors@R: c(person("Xiuwen", "Zheng", role=c("aut", "cre"), email=
"xiuwen.zheng@abbvie.com", comment=c(ORCID="0000-0002-1390-0708")),
person("Seurat contributors", role="ctb",
comment="for the classes and methods defined in Seurat"))
Description: Extends the Seurat classes and functions to support GDS files as
a DelayedArray backend for data representation. It introduces a new
SCArrayAssay class (derived from the Seurat Assay), which wraps raw
counts, normalized expressions and scaled data matrix based on
DelayedMatrix. It is designed to integrate seamlessly with the
SeuratObject and Seurat packages to provide common data analysis, with
the optimized algorithms for GDS data files. Compared with Seurat,
SCArray.sat significantly reduces the memory usage and can be applied
to very large datasets.
License: GPL-3
VignetteBuilder: knitr
ByteCompile: TRUE
biocViews: DataRepresentation, DataImport, SingleCell, RNASeq
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/SCArray.sat
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: a18442a69989758a95ff813270856728ec5b6fdd
The only error is
ERROR: Maintainer must add package name to Watched Tags on the
support site; Edit your Support Site User Profile to add Watched
Tags.
To fix this error, I have added "SCArray.sat" to my watched tags, and run "BiocCheck" locally (no error shows). I just bumped the version to trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/SCArray.sat
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Thank you for submitting scArray.sat to Bioconductor, @zhengxwen. I hope to begin my review of the package next week, but it may not be until the following that I can get started due to other commitments. In the meantime, if you have any questions then please post them here.
Cheers, Pete
A few pre-review questions, all rather related:
Thanks for these pre-review questions. They are very helpful.
1) SCArray.sat extends the Seurat classes and functions to support Genomic Data Structure (GDS) files as a DelayedArray backend for data representation. It relies on the implementation of GDS-based DelayedMatrix in the SCArray package to represent single cell RNA-seq data. The common optimized algorithms leveraging GDS-based and single cell-specific DelayedMatrix (SC_GDSMatrix) are implemented in the SCArray package.
SCArray.sat introduces a new SCArrayAssay class (derived from the Seurat Assay), which wraps raw counts, normalized expressions and scaled data matrix based on GDS-specific DelayedMatrix (SC_GDSMatrix). It is designed to integrate seamlessly with the Seurat package to provide common data analysis in the SeuratObject-based workflow. Some of the Seurat-specific methods are re-implemented in SCArray.sat to manipulate GDS files.
2)
Yes, the SingleCellExperiment-based workflows are fully supported by the SCArray package, while SCArray.sat is designed to support and extend the SeuratObject-based workflows. The Seurat object can be converted to a SingleCellExperiment object, via as.SingleCellExperiment()
.
3) “sat” stands for “Seurat”. I feel that “SCArray.Seurat” is too long for the package name, and it is longer than 92% of the Bioconductor software packages.
I hope that it is clear now.
Hi @zhengxwen,
Thank you for submitting SCArray.sat to Bioconductor.
Overall, the package is in good shape and close to being ready for acceptance.
Regarding the package name, I'd recommend going with SCArray.Seurat over SCArray.sat; I don't know of anyone else who abbreviates 'Seurat' to 'sat', which means people won't know the connection of SCArray.sat based on its name. To me, having 3 extra characters to ensure people know what the package is seems a tiny price to pay. But of course, the choice of name is yours to make.
In my review below I have separated the issues into Required and Recommended points that I would ask you to address before the package can be accepted. Would you please provide line-by-line comments to my initial review so that I know what changes I'm looking for in my re-review.
Cheers, Pete
HDF5Array::saveHDF5SummarizedExperiment()
needed? See the section 'Difference between saveHDF5SummarizedExperiment() and saveRDS()' of ?HDF5Array::saveHDF5SummarizedExperiment()
for a full explanation of why it's needed for HDF5Array-based SingleCellExperiment objects. But briefly, if someone saves a SC_GDSMatrix_backed Seurat object to disk with saveRDS()
and then copies that .rds
file to another computer and tries to open it, then I expect there will be an error because they won't also have copied the necessary .gds
file (or it might be in a different location even if they do know to copy it). This would also apply to SCArray and it's SC_GDSMatrix_backed SingleCellExperiment objects.set.seed()
before steps involving random number generation (e.g., RunPCA()
and runUMAP()
) to ensure reproducibility.D100
, D250
, D500
, and Dfull
, but these aren't defined. Please include the code used to generate these objects.tempdir()
/tempfile()
when calling saveRDS()
. You can add a comment that a user running this funtion as part of a 'real' analysis should specify an appropriate location for the file.README.md
is broken because scGetAssayGDS()
doesn't exist. Please fix.suppressPackageStartupMessages({
library(Seurat)
library(SCArray.sat)
})
# an input GDS file with raw counts
fn <- system.file("extdata", "example.gds", package="SCArray")
fn
#> [1] "/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/SCArray/extdata/example.gds"
a <- scGetAssayGDS(fn)
#> Error in scGetAssayGDS(fn): could not find function "scGetAssayGDS"
BugReports
field to the DESCRIPTION
(https://contributions.bioconductor.org/description.html#description-bugreport)ByteCompile: TRUE
from the DESCRPTION
; it necessary because it's the default.show()
-ing a SCArrayFileClass. This seems to be from use of crayon::blurred()
in gdsfmt::print.gds.class()
. I guess 'crayon::blurred()` is doing exactly what the function name says, but I'd suggest changing this to a font that can be read clearly on most systems.SCTransform(d)
gives an error that may be confusing to new users. While it may be tricky (if not impossible) to inject a SCArray.sat-specific error when a Seurat function is applied that SCArray.sat does not support, this table featured prominently in the documentation could at least set the expectations for a new user.suppressPackageStartupMessages({
library(Seurat)
library(SCArray)
library(SCArray.sat)
})
fn <- system.file("extdata", "example.gds", package="SCArray")
d <- scNewSeuratGDS(fn)
#> Input: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/SCArray/extdata/example.gds
#> counts: 1000 x 850
SCTransform(d)
#> Calculating cell attributes from input UMI matrix: log_umi
#> Error in row_gmean(umi, eps = gmean_eps): matrix x needs to be of class matrix or dgCMatrix
inst/CITATION
file if applicable (https://contributions.bioconductor.org/citation.html).Received a valid push on git.bioconductor.org; starting a build for commit id: 38d463bf7e58a3b5c1343e8dee9de10008e5266d
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/SCArray.sat
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
I am not able to fix the WARNING and ERROR in the build (commit id: 38d463bf7e58a3b5c1343e8dee9de10008e5266d):
WARNING: R CMD check exceeded 10 min requirement
ERROR:
there is no package called 'BiocGenerics'
package SCArray required by SCArray.sat could not be found
Hi Peter,
Thank you for your detailed comments, and they are very helpful.
I will stick to the package name "SCArray.sat", and thanks for the understanding.
Here are my responses:
It is not necessary. SCArray::scConvGDS()
can save the count data (or other data matrices) to a GDS file. We usually start from the raw count data, then create a GDS file, load it as a SingleCellExperiment
object or a SeuratObject
object. When calling saveRDS()
, the data of GDS-based DelayedArray are not saved in the RDS file. I explained it in the section "Save SCArrayAssay" of the vignette, and remind users to keep the GDS and RDS files in the same directory or the same relative path. If the corresponing GDS file is not found when manipulating the DelayedArray object in the RDS file, an error will show, like:
Error in gdsfmt:::.reopen(x@gds) :
Can not open file 'example.gds'. No such file or directory
set.seed(42)
was added to the vignette and README.md.
The following codes were added to the vignett:
library(SCArray)
library(SCArray.sat)
sce <- scExperiment("1M_sc_neurons.gds") # load the full 1.3M cells
# D100 dataset
scConvGDS(sce[, 1:1e5], "1M_sc_neurons_d100.gds") # save to a GDS
# in-memory Seurat object
obj <- scMemory(scNewSeuratGDS("1M_sc_neurons_d100.gds"))
saveRDS(obj, "1M_sc_neurons_d100_seuratobj.rds") # save to a RDS
# D250 dataset
scConvGDS(sce[, 1:2.5e5], "1M_sc_neurons_d250.gds")
obj <- scMemory(scNewSeuratGDS("1M_sc_neurons_d250.gds"))
saveRDS(obj, "1M_sc_neurons_d250_seuratobj.rds")
# D500 dataset
scConvGDS(sce[, 1:5e5], "1M_sc_neurons_d500.gds")
obj <- scMemory(scNewSeuratGDS("1M_sc_neurons_d500.gds"))
saveRDS(obj, "1M_sc_neurons_d500_seuratobj.rds")
# Dfull dataset
scConvGDS(sce, "1M_sc_neurons_dfull.gds")
(CPU: Intel Xeon Gold 6248 @2.50GHz, RAM: 176GB)
is added to the vignette.
"test.rds"
is replaced by tempfile(fileext=".rds")
in the vignette.
It is fixed now.
BugReports: https://github.com/zhengxwen/SCArray.sat/issues
is added to DESCRIPTION.
A section "Installation" is added to the vignette.
ByteCompile: TRUE
is removed from DESCRIPTION.
I seldom use RStudio, and I am not sure when RStudio started using the blur effect for crayon::blurred()
. It is fixed in the most recent gdsfmt_1.35.7. Or users can disable crayon terminal output by options(gds.crayon=FALSE)
.
BiocStyle is used in the vignette now.
A section "List of supported functions" is add to the vignette. As described in the help file of SCTransform
, it calls sctransform::vst
(i.e., the function in another R pacakge rather than Seurat & SeuratObject). Not all of the functions can directly support SCArrayAssay
. To increase the compatibility of SCArray.sat, contacting with the Seurat package maintainers would be also helpful. Before any additional feature support, SCArray.sat should be deposited in a permanent and public repository. In addition, the SCArrayAssay
object can be downgraded to the regular Assay
, see the section "Downgrade SCArrayAssay".
The file inst/CITATION
is added.
Hi @zhengxwen,
Thank you for your response to the initial review. I'm now happy to accept SCArray.sat into Bioconductor. Congratulations and thank you for your contribution!
Is something like HDF5Array::saveHDF5SummarizedExperiment() needed?
It is not necessary. ...
What you describe is exactly why something like a function like HDF5Array::saveHDF5SummarizedExperiment()
(e.g., SCArray.sat::saveGDSSeuratObejct()
) is necessary or at least very helpful for users.
It can make it easier for a user to share a GDS-backed Seurat object with someone else, or just to move it to a new location.
It's not a requirement for acceptance, but I think it would make a useful addition.
Sorry, before I can formally accept SCArray.sat could you please bump the version and push to git.bioconductor.org to trigger a new build.
The most recent build on the Single Package Builder (SPB) resulted in an error on the Linux builder (nebbiolo1
) - I suspect it was a transient error and that a simply re-running it will resolve the issue.
The package builds and checks fine on my system (Mac M1) but it would be good to confirm it passes on other systems using the SPB.
Received a valid push on git.bioconductor.org; starting a build for commit id: d3e75e18bc3f4895f3302f65bc246ed99c0c2c63
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/SCArray.sat
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
Thanks, @zhengxwen!
I'm happy to accept the package despite the WARNING: check time exceeded 10 min
on merida1
- the latest build is 49 seconds over the check time there but only takes 5 mins on nebbiolo1
.
Please keep an eye on builds once it enters the regular BioC builds and see if there's anything you can do to reduce the time taken to check the package.
The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/zhengxwen.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("SCArray.sat")
. The package 'landing page' will be created at
https://bioconductor.org/packages/SCArray.sat
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.