Bioconductor / Contributions

Contribute Packages to Bioconductor
131 stars 33 forks source link

SingleCellAlleleExperiment #3339

Closed Jonas-Schuck closed 2 months ago

Jonas-Schuck commented 3 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 3 months ago

Hi @Jonas-Schuck

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: SingleCellAlleleExperiment
Title: S4 Class for Single Cell Data with Allele and Functional Levels for Immune Genes
Version: 0.99.0
Authors@R: 
  c(
  person("Jonas", "Schuck", email="jschuckdev@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0009-0003-5705-4579")),
  person("Ahmad", "Al Ajami", role = c("aut"),
comment = c(ORCID = "0009-0006-5615-7447")),
  person("Federico", "Marini", role = c("aut"),
comment = c(ORCID = "0000-0003-3252-7758")),
  person("Katharina", "Imkeller", role = c("aut"),
comment = c(ORCID = "0000-0002-5177-0852")))
Description: 
  Defines a S4 class that is based on SingleCellExperiment. In addition to the usual gene layer
  the object can also store data for immune genes such as HLAs, Igs and KIRs at allele and functional level.
  The package is part of a workflow named single-cell ImmunoGenomic Diversity (scIGD), that firstly incorporates allele-aware quantification data for immune genes.
  This new data can then be used with the here implemented data structure and functionalities for further data handling and data analysis.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
Depends:
  SingleCellExperiment
Imports:
  SummarizedExperiment,
  BiocParallel,
  DelayedArray,
  scuttle,
  methods,
  utils,
  Matrix,
  S4Vectors,
  stats,
  ggplot2,
  org.Hs.eg.db,
  AnnotationDbi,
  BiocGenerics,
  DropletUtils
Suggests:
  scaeData,
  knitr,
  rmarkdown,
  scran,
  scater,
  patchwork,
  BiocStyle,
  testthat (>= 3.0.0)
biocViews: DataRepresentation, Infrastructure, SingleCell, Transcriptomics,
  GeneExpression, Genetics, ImmunoOncology, DataImport
URL: https://github.com/AGImkeller/SingleCellAlleleExperiment
BugReports: https://github.com/AGImkeller/SingleCellAlleleExperiment/issues
VignetteBuilder: knitr
Config/testthat/edition: 3
Jonas-Schuck commented 3 months ago

@AGImkeller @federicomarini @ahmadalajami

lshep commented 3 months ago

Just a reminder, because this package utilizes scaeData that is also under review we would not be able to review this package until scaeData is accepted and on the builders since this was not submitted as an additionalpackage to the scaeData submission.

bioc-issue-bot commented 3 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SingleCellAlleleExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: bf86234a4900285d0856d77017f355812f71f9df

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SingleCellAlleleExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f98032cdf4022fcb68a5cb75d8fecb236c52d89b

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SingleCellAlleleExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 3 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: e8fc21951177b4cdac2e56003938869f897237ac

bioc-issue-bot commented 3 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): SingleCellAlleleExperiment_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SingleCellAlleleExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

Jonas-Schuck commented 3 months ago

Hello @lshep , since scaeData is now integrated after the successful nightly build, we performed a version bump here to re-trigger the building process. Sadly, we got a WARNING as we don’t quite scratch the max allowed time of CMD CHECK. Currently it lies at 10:58 minutes. Would this be an issue for the review process to begin, while we figure out a way to squeeze the check time a little more (if really necessary, as its already quite close to the time limit)? We would be happy for your advice on how to proceed here, thanks a lot in advance!

bioc-issue-bot commented 2 months ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

HelenaLC commented 2 months ago

In the interest of time, I am posting a first round of comments here to get the ball rolling... Specifically, please try and address comments related to dependencies and class development, also considering the corresponding guidelines - the current implementation "feels" like a set of functions rather than an S4 class. This is not bad, per se. However, defining a reader and subsetting of these data for an SCE would be sufficient, plus a couple checks that the rowData match your expectation - unless the overhead of defining a class properly is met.

If there're any questions, points that aren't clear etc., fell free to ask/discuss here - cheers!

NAMESPACE

docs

code

# e.g., this...
get_nigenes <- function(scae) {
  subset_rows <- stats::complete.cases(rowData(scae)$NI_I, rowData(scae)$Quant_type)
  agenes <- scae[subset_rows & rowData(scae)$NI_I == "NI" & rowData(scae)$Quant_type == "G", ]
  return(agenes)
}

# ...could be
rd <- rowData(scae)
subset_rows <- stats::complete.cases(rd$NI_I, rd$Quant_type)
agenes <- scae[subset_rows & rd$NI_I == "NI" & rd$Quant_type == "G", ]

# or...
rd <- rowData(scae)
ni <- rd$NI_I
qt <- rd$Quant_type
subset_rows <- stats::complete.cases(ni, qt)
agenes <- scae[subset_rows & ni == "NI" & qt == "G", ]

# another example:
rowData(final_scae[rownames(final_scae) %in% uniqs])$NI_I <- "I"
rowData(final_scae[rownames(final_scae) %in% uniqs])$Quant_type <- "F"

# ...vs.
idx <- rownames(final_scae) %in% uniqs
rowData(final_scae)$NI_I[idx] <- "I"
rowData(final_scae)$Quant_type[idx] <- "F"
Jonas-Schuck commented 2 months ago

Thanks for your timely review @HelenaLC, we are looking into this and will update/change the package accordingly!

bioc-issue-bot commented 2 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 23439e16399516af6f9e5d7e437375c4712cfa07

bioc-issue-bot commented 2 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): SingleCellAlleleExperiment_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SingleCellAlleleExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

Jonas-Schuck commented 2 months ago

Hello @HelenaLC , please find below our comments and changes. We appreciate your review and the points you raised.

In the interest of time, I am posting a first round of comments here to get the ball rolling... Specifically, please try and address comments related to dependencies and class development, also considering the corresponding guidelines - the current implementation "feels" like a set of functions rather than an S4 class. This is not bad, per se. However, defining a reader and sub-setting of these data for an SCE would be sufficient, plus a couple checks that the rowData match your expectation - unless the overhead of defining a class properly is met.

This is a very good point. We agree that the package structure was not clear enough and the definition of a novel data structure was impaired by additional (and default) “convenient” functionalities. We decided to follow your suggestion and focused on the implementation of a S4 class. We think that implementing a multi-layered data structure, more specifically, aggregating information about groups of expression features via an ontology approach and keeping those multiple layers accessible throughout the process, is a valuable addition to Bioconductor. In the future, this structure can also be useful for other biological applications, where features have different levels of meaningful annotation (e.g. splicing variant, tRNAs, …)

NAMESPACE

  • [ ] since the package's purpose is to implement an S4 class, I'd strongly encourage working towards a minimal number of dependencies. Any visualizations, processing etc. could be demoed in the vignette/examples with the corresponding libraries under Suggests: (see here for details). The motivation here is that other packages dedicated to analysis and visualization etc. may only need the class definition/constructor/accessor/replacement methods, but needn't have additional dependencies. Specifically,

Thanks for bringing this up. By focusing on the core functionality of our S4 object we reduced the number of dependencies significantly. Find more info in the following points.

  • there's no reason to depend on any data packages; these can be attached for use in tests, examples, the vignette, but can go under Suggests:

The scaeData package already was listed under Suggests:. If we missed other data packages, please let us know.

  • there's no need to depend on ggplot2; the plot could be generated in the vignette, but SCAE-specific visualizations should be dispatched in a separate package (or made optional by suggesting the package)

This is fixed. ggplot2 as well as DropletUtils went to Suggests:. This makes the potential filtering based on a computed knee plot optional. If the read in function read_allele_counts() is used with the parameter filter_mode=”yes”, data is still filtered on the inflection point of the corresponding knee plot. But instead of plotting it, all necessary information is passed to the metadata(scae)[[knee_info’]]` slot, ready to be plotted later on.

  • scuttle also seems rather superfluous; for demonstration, logcounts could be computed in the vignette, using this or any other method, but suggesting the package should suffice (making that computation optional in the function is also possible when suggesting the package)

This is also optional now with scuttle moved to Suggests:. But toggled on by default. The reason behind this is the following: If the user would miss the part about how to properly normalize the multi-layered data, this would result in falsely normalized data. As the assays in the SCAE data structure are extended (more rows are added to the raw data counts) it is necessary to compute libraryFactors only on the raw data layers. Otherwise the Size Factors would not be correct (immune features would contribute to the size factor multiple times). We appreciate the point you raised to make this optional, but we still would like to keep it as a default (but optional) functionality, which still offers experienced users to perform a different normalization method later on.

Additionally, we also moved all packages related to computing new gene symbols under suggests, making this an optional feature as well.

We think our newly implemented changes allow us to keep things simple on default, having only few dependencies for the implementation of the core data structure, but still provide us with the option to use more advanced functionalities if necessary/wanted.

docs

  • [ ] SingleCellAlleleExperiment.Rd is missing a runnable example; since the class is what "it's all about", could also demo (in code and/or comments) some aspects of how the object compares to an SCE; have a look at the SCE doc for inspiration, or SPE for more extended examples

Thanks for raising this. We added an example and exported the Constructor.

  • [ ] in the vignette, consider linking to external packages using BiocStyle functions (CRANPkg/BiocPkg() for packages distributed through GitHub/CRAN/Bioc)

This was changed accordingly. Additionally, new sections can be found in the vignette, giving information about the (now) optional functionalities.

code

  • [ ] please have a look at these guidelines regarding class development and try to follow them -- e.g., code and script organization

Changes to code and script organisation were done. Now there is a file for each exported function, which also includes their respective internal functions. One additional file for misc functions, currently containing the show method. Inspiration taken from SCE/SPE.

  • [ ] related, the class is missing a validity method and, potentially, appropriate replacement methods (e.g., something like rowData(scae)$NI_I <- 'xyz' would break lots of things and setting NI_I/Quant_Type to NULL gives basically a plain SCE, I think?); validity and setters need to assure the object cannot be "broken"

That’s a very good point. Thanks for bringing this to our attention. We added some validity checks to make sure that handling the rowData is done in a more robust manner now. We are aware that this might not catch every single point on how one might break the object, but we are having an eye on this now, willing to extend this if additional relevant cases come up. Also a setter for the rowData was defined, testing for the validity checks.

  • [ ] related, it'd be nice if there was a simple show method to distinguish a SCAE object from a SCE; printed in the console, it's hard to tell the difference - e.g., could print the number NI/I features or ??? ...something that you feel is worth highlighting (if anything?)

Thanks for bringing this up. We defined an extension to the SCE show method, giving information about the content of the newly defined layers.

  • [ ] being the only exported getter, it'd be great if scae_subset (and/or the functions being called by it) implemented appropriate validity checks on input arguments and structure (of the input SCAE)

We incorporated validity checks for the output of scae_subset as well as a check (for input type and class type) for the chosen subset input. Please find the changes in the scaeSubset.R file.

  • [ ] a general suggestion to better stick to the recommend 80-character line limit, improve legibility, and make your life easier: consider defining intermediate variables when, say, the same series of symbols is used repeatedly in a function, e.g.:

Thanks for the examples. We made changes to the code to better stick to the 80-character line limit and increase legibility.

This is it for now. We look forward to hearing back from you!

HelenaLC commented 2 months ago

Thanks for all that, big improvement overall! I have 2 outstanding comments (+ 1 very minor one) that I feel should be addressed; especially correct implementation of mandatory/optional (Imports/Suggests:) packages. Packages under Suggests: will not be automatically installed. So running functions with missing dependencies will currently fail even when part of the execution is optional - you need to check for this and execute accordingly (details below)!

Jonas-Schuck commented 2 months ago

Thanks for your second round of comments! We will address them and come back to you early next week.

HelenaLC commented 2 months ago

Sure! Note that the package has to be in my Wed April 24th to be added to the 3.19 release - so the earlier you push & pass the better, just in case 🙏

bioc-issue-bot commented 2 months ago

Received a valid push on git.bioconductor.org; starting a build for commit id: a94efb11da5fd0b6cdc336689c4f9c25c99ebfee

Jonas-Schuck commented 2 months ago

Hello @HelenaLC, please find below our comments and changes.

Thanks for all that, big improvement overall! I have 2 outstanding comments (+ 1 very minor one) that I feel should be addressed; especially correct implementation of mandatory/optional (Imports/Suggests:) packages. Packages under Suggests: will not be automatically installed. So running functions with missing dependencies will currently fail even when part of the execution is optional - you need to check for this and execute accordingly (details below)!

  • minor comment on show method: are there cases where the number of included features differs from nrow(scae)? If yes, all good. If no, I find the line "Including a total of N features:" redundant.

Thanks for the advice. We removed redundancy and now give information about how many immune-related features the SCAE-object contains. image

  • invalid replacement (e.g., rowData(scae)$NI_I <- NULL) now give an error and return the object in it's original form, however, it'd be desirable to give a more meaningful message here (currently, the message from the validity method is given). E.g., in SpatialExperiment, the must-have colData is retained silently or a message is given why a replacement is invalid (vs. an "invalid object" error); see here

Thanks for providing the SPE example! We implemented silent retention and extended the error messages when setting invalid values. More specifically:

  • When performing rowData(scae)<-NULL we silently retain the NI_I and Quant_type column (inspired by SPE).
  • Additional error messages and more detailed information inside the setter functions rather than having all checks in the validity method.

Examples: image image

  • any calls to packages under Suggests: should check that the package is available and either skip that functionality (for optional things) or give a meaningful message to the user; see here. E.g., SingleCellAlleleExperiment() will just fail when scuttle or DelayedArray are unavailable, but it should still work without normalization & creating a delayed matrix, if it truly is "optional"? If not, these packages would be necessary for core functionality (class construction) and there should be under Imports: instead. Vice versa, something like get_ncbi_org() cannot work without org.Hs.eg.db and AnnotationDbi, so the user should be told so.

Thank you, this is indeed an important point. We had the checks already set in the read-in function read_allele_counts() but missed to also check in the SCAE-Constructor. Now all packages that are related to optional functionalities (e.g. scuttle, DropletUtils, org.Hs.eg.db, AnnotationDbi) should be checked for installation before using the functionality and give a corresponding message if they’re not installed. You can now find the corresponding function in the validity.R file and/or in the first few lines of code of the SingleCellAlleleExperiment() and read_allele_counts() functions.

See below for examples:

image

Thank you again for your valuable comments. We think that the package improved a lot throughout the review!

bioc-issue-bot commented 2 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): SingleCellAlleleExperiment_0.99.5.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/SingleCellAlleleExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 months ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 2 months ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/Jonas-Schuck.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("SingleCellAlleleExperiment"). The package 'landing page' will be created at

https://bioconductor.org/packages/SingleCellAlleleExperiment

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.