Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

SpotClean #2637

Closed zijianni closed 2 years ago

zijianni commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 years ago

Hi @zijianni

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SpotClean
Version: 0.99.3
Date: 2022/4/11
Title: SpotClean adjusts for spot swapping in spatial transcriptomics data
Authors@R: c(
    person("Zijian", "Ni", role = c("aut", "cre"), email = "zni25@wisc.edu"), 
    person("Christina", "Kendziorski", role="ctb"))
Depends: 
    R (>= 4.0.0),
Imports: 
    stats,
    methods,
    utils,
    dplyr,
    S4Vectors,
    SummarizedExperiment,
    Matrix,
    rhdf5,
    ggplot2,
    grid,
    readbitmap,
    rjson,
    tibble,
    viridis,
    grDevices,
    RColorBrewer,
    Seurat
Suggests: 
    testthat (>= 2.1.0),
    knitr,
    BiocStyle,
    rmarkdown,
    R.utils
biocViews:
    DataImport,
    RNASeq,
    Sequencing,
    GeneExpression,
    Spatial,
    Transcriptomics,
    Preprocessing
Description: 
    SpotClean is a computational method to adjust for spot swapping in spatial 
    transcriptomics data. Recent spatial transcriptomics experiments utilize 
    slides containing thousands of spots with spot-specific barcodes that bind 
    mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific 
    expression, but this is often not the case due to bleed from nearby spots, 
    an artifact we refer to as spot swapping. SpotClean is able to estimate the 
    contamination rate in observed data and decontaminate the spot swapping 
    effect, thus increase the sensitivity and precision of downstream analyses. 
License: GPL-3
NeedsCompilation: yes
VignetteBuilder: knitr
Encoding: UTF-8
SystemRequirements: C++11
RoxygenNote: 7.1.2
URL: https://github.com/zijianni/SpotClean
BugReports: https://github.com/zijianni/SpotClean/issues
drisso commented 2 years ago

Dear @zijianni,

this looks like a really interesting and useful package! Thanks for contributing it to Bioconductor!

Looking at your code, I've noticed that you have a custom reader function for 10X Visium data and that you use SummarizedExperiment to store the data, storing the spatial information inside the metadata slot. Do I understand correctly?

I wanted to make sure that you are aware of the SpatialExperiment package, which is our attempt at extending SummarizedExperiment to store spatial transcriptomic data.

In that package, we provide the read10xVisium function to read 10X Visium data. This function returns a SpatialExperiment object that stores all the data that you need for your package (including the image). It shouldn't be much work to restructure the package to use SpatialExperiment and this will ensure interoperability between your and other spatial transcriptomic packages. For instance, it will allow users to visualize the data using the ggspavis package.

We are very open to engaging with the community and getting feedback by developers on the SpatialExperiment class.

Tagging @lmweber @HelenaLC @drighelli who are actually developing SpatialExperiment in case they have further input.

Anyway, thanks for considering this.

zijianni commented 2 years ago

Hi @drisso et al,

Thanks for the message! I do know about SpatialExperiment and I'm happy to extend my package to directly apply SpotClean to SpatialExperiment class.

One thing I was hesitated using read10xVisium() is that it makes too strong assumption about the structure of the folders and files. In practice I'm not seeing a very consistent naming and structure of Visium output files, either due to different spaceranger versions or customized naming. As a result, I always have to manually arrange the files so that they can be recognized by read10xVisium.

I'm working on making our main function SpotClean() runnable on SpatialExperiment object constructed using read10xVisium. Happy to add more features if SpotClean is found useful by the community in the future.

drighelli commented 2 years ago

Hi @zijianni ,

thanks for taking into consideration using the SpatialExperiment class for your package.

Taking into account your comments on the read10xVisium() function, would you mind opening a new issue in our repository describing the problems you encountered? This would be very useful for us to further improve our class.

Thanks, Dario

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SpotClean to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 239dd931c7d599563cdf82122b529a2650675443

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SpotClean to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 3e8fefbff7fde421d37193ee4aeb589c43545f1e

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SpotClean to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

zijianni commented 2 years ago

Hi @zijianni ,

thanks for taking into consideration using the SpatialExperiment class for your package.

Taking into account your comments on the read10xVisium() function, would you mind opening a new issue in our repository describing the problems you encountered? This would be very useful for us to further improve our class.

Thanks, Dario

Hi @drighelli ,

Actually it's not the issue with read10xVisium(). read10xVisium() works very well for the output data directly from the Spaceranger pipeline, where we have the raw/filtered data matrix in folders as well as in HDF5 format, and the spatial folder containing tissue images etc. And this is how read10xVisium() has been designed for.

I've experienced tons of datasets where their data files are not organized as from Spaceranger output, so read10xVisium() cannot find the files it expects to see in the specified directory. Usually I have to load the count matrix and spatial information separately and then combine them together, which is more flexible but more complicated. That's why I have two functions Read10xRaw() and Read10xSlide() for loading data and CreateSlide() for merging into one object in my package.

Best, Zijian

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 05708aca5d53c429627d5e0500ee2ea97dacec04

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SpotClean to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

zijianni commented 2 years ago

Hi @lazappi ,

What's the expected timeline for reviewing this package? I'm leaving my current institution by the end of this month. It will be great if we can make some progress before that. Thanks!

Best, Zijian

lazappi commented 2 years ago

Hi @zijianni

Sorry I was on leave last week and I wasn't quite able to get to this before I left. I will try to find time this week.

lazappi commented 2 years ago

Hi @zijianni

Thanks for submitting SpotClean :tada:! Below is my review of your package. Please reply here if anything is unclear or needs any further explanation.

What next?

Please address the comments as best as you can. When you are ready for me to check the package again please reply to let me know with a summary of changes you have made or any other responses.

Luke

Review

Key: :rotating_light: Required :warning: Recommended :green_circle: Optional :question: Question

General package development

DESCRIPTION file

NAMESPACE file

Documentation

Vignette

Man pages

Code

R

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: afef6de7a63837f4d8679409943e4633898b0891

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SpotClean to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

zijianni commented 2 years ago

Hi @lazappi Thanks for your careful review of my package! Please see below my point-by-point response. I’ve addressed your concerns to the best of my ability. Let me know if anything is unclear. Further suggestions are definitely welcome.

Hi @zijianni

Thanks for submitting SpotClean 🎉! Below is my review of your package. Please reply here if anything is unclear or needs any further explanation.

What next?

Please address the comments as best as you can. When you are ready for me to check the package again please reply to let me know with a summary of changes you have made or any other responses.

Luke

Review

Key: 🚨 Required ⚠️ Recommended 🟢 Optional ❓ Question

General package development

  • [x] 🚨 Please address as many of the BiocCheck notes in the build report as possible

I’ve addressed as many of the BiocCheck notes as I can.

For the remaining notes:

DESCRIPTION file

  • [x] 🚨 Please update the depended R version to 4.2

Done.

  • [x] 🚨 You have C++11 listed in SystemRequirements but I couldn't see any C++ code. Please remove this if it isn't needed.

I’ve removed the C++ requirement.

NAMESPACE file

  • [x] 🚨 Please use lowerCamelCase for function names, UpperCamelCase is reserved for objects

Below are the function name changes to comply with the lowerCamelCase rule:

Documentation

Vignette

  • [x] 🚨 Please add a table of contents to the vignette

Done.

  • [x] 🚨 All code chunks in the vignette should be executed

Now only 3 chunks in the Quick Start section that are not executed:

Only 1 chunk in the Detailed Steps section is not executed:

  • [x] 🚨 Please includes a motivation for submitting the package to Bioconductor in the Introduction section

Done.

  • [x] 🚨 Please mention related or alternative packages in the Introduction section

Done.

Man pages

  • [x] 🟢 It is recommended to add a package man page

Added a package man page.

  • [x] 🟢 Rather than using comments you can use Roxygen tags to document internal functions and mark them as internal

Thanks for the suggestion! I will try to address them in future updates.

Code

R

  • [x] ⚠️ It is recommended to test that function arguments are valid in exported functions

I’ve added more codes to validate function arguments for major functions in the package.

Now my package only imports necessary ggplot2 functions via importFrom instead of importing the whole ggplot2 package.

lazappi commented 2 years ago
  • There is no runnable example in convert_to_seurat.R since this function requires external image-related files (the sizes of which can be large). Since this is not a major function in my package, I don't think it worths sacrificing the size of the packge only to make this single example runnable. Instead, I’m instructing users to correctly specify the path to image-related files by themselves.

This function needs to be tested in some way, currently it doesn't have a runnable example, successful tests or is run in the vignette. There are a few options for how to do this. You could include some example data in this package (I understand your concerns about size but it should be possible to come up with a small example), you could reuse example data in another package (maybe {SpatialExperiment} or {Seurat}) or you could use {BiocFileCache} to download a public dataset to test on.

There is also a small typo in tissue_lowres_iamge.png in the function documentation 😸

  • [x] 🚨 All code chunks in the vignette should be executed

Now only 3 chunks in the Quick Start section that are not executed:

  • The first one is the illustration of package installation.
  • The second one is a summary of the Detailed Steps section containing exactly the same codes. In order to save package build time, this chunk is set to not be evaluated.

I would still be concerned that this will not be kept up to date if there were any future changes to these functions. It should be mentioned in the text that this is not run and the steps are explained in the later sections.

  • The third one is the illustration of the application of SpotClean on SpatialExperiment class. Again, more details are at the Detailed Steps section. In order to save package build time, this chunk is set to not be evaluated.

Why have you chosen to base your package around SummarizedExperiment rather than SpatialExperiment? It seems like SpatialExperiment would be the natural fit for this kind of data and would make maintenance easier as you could reuse functionality in existing packages.

Now my package only imports necessary ggplot2 functions via importFrom instead of importing the whole ggplot2 package.

👍🏻 What I was specifically referring to is using .data$imagerow rather than statements like imagerow <- imagecol <- barcode <- NULL to avoid warnings about undeclared variables.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: dc3a1f12459c73ed97946ae570f1dca182d46673

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SpotClean to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

zijianni commented 2 years ago

Hi @lazappi Thanks again for your additional comments! Please see updates below. Let me know if you have further questions and comments.

  • There is no runnable example in convert_to_seurat.R since this function requires external image-related files (the sizes of which can be large). Since this is not a major function in my package, I don't think it worths sacrificing the size of the packge only to make this single example runnable. Instead, I’m instructing users to correctly specify the path to image-related files by themselves.

This function needs to be tested in some way, currently it doesn't have a runnable example, successful tests or is run in the vignette. There are a few options for how to do this. You could include some example data in this package (I understand your concerns about size but it should be possible to come up with a small example), you could reuse example data in another package (maybe {SpatialExperiment} or {Seurat}) or you could use {BiocFileCache} to download a public dataset to test on.

I’ve put some image-related files to inst/extdata. Now this function has runnable examples as well as unit tests.

Given that I’ve added raw image-related files, I’ve discarded the use of data/mbrain_slide_info.rda and updated all the examples, tests, and vignette accordingly.

There is also a small typo in tissue_lowres_iamge.png in the function documentation 😸

Fixed. Thanks!

  • [x] 🚨 All code chunks in the vignette should be executed

Now only 3 chunks in the Quick Start section that are not executed:

  • The first one is the illustration of package installation.
  • The second one is a summary of the Detailed Steps section containing exactly the same codes. In order to save package build time, this chunk is set to not be evaluated.

I would still be concerned that this will not be kept up to date if there were any future changes to these functions. It should be mentioned in the text that this is not run and the steps are explained in the later sections.

I’ve further reduced non-runnable chunks and added notes to clarify that in the vignette.

  • The third one is the illustration of the application of SpotClean on SpatialExperiment class. Again, more details are at the Detailed Steps section. In order to save package build time, this chunk is set to not be evaluated.

Why have you chosen to base your package around SummarizedExperiment rather than SpatialExperiment? It seems like SpatialExperiment would be the natural fit for this kind of data and would make maintenance easier as you could reuse functionality in existing packages.

To be honest, I was not aware that SpatialExperiment is becoming an officially supported and encouraged Bioconductor package when I wrote my package. I also have additional concerns about using it, as discussed earlier in this thread.

Saying that, SummarizedExperiment and SpatialExperiment share many similar properties, and there has been many examples to convert between them (and SingleCellExperiment). I hope this won’t be a big problem for future maintenance.

Now my package only imports necessary ggplot2 functions via importFrom instead of importing the whole ggplot2 package.

👍🏻 What I was specifically referring to is using .data$imagerow rather than statements like imagerow <- imagecol <- barcode <- NULL to avoid warnings about undeclared variables.

Thanks! I’ve made updates to incorporate that.

lazappi commented 2 years ago

Hi @zijianni. I am happy with this now so will mark it as accepted. Congratulations on getting the package into Bioconductor 🎉! It can take a couple of days to be picked up by the build system but then it should be available in Bioconductor devel.

bioc-issue-bot commented 2 years ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

zijianni commented 2 years ago

Thanks again @lazappi for your time reviewing my package!

lshep commented 2 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/zijianni.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("SpotClean"). The package 'landing page' will be created at

https://bioconductor.org/packages/SpotClean

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.