Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

biotidy #3478

Open Yunuuuu opened 2 months ago

Yunuuuu commented 2 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 2 months ago

Hi @Yunuuuu

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: biotidy
Title: Tidy utils for Bioinformatic objects
Version: 0.99.0
Authors@R: 
    person("Yun", "Peng", , "yunyunp96@163.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0003-2801-3332"))
Description: This is a collection of utility functions that allow to bring 
    Bioinformatic objects like SummarizedExperiment and Seurat into tidy(verse) framework.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
biocViews: AssayDomain, RNASeq, GeneExpression, Sequencing, SingleCell
Imports: 
    data.table,
    rlang (>= 1.1.0),
    stats
Suggests: 
    Biobase,
    cli,
    knitr,
    methods,
    rmarkdown,
    SeuratObject,
    SingleCellExperiment,
    SummarizedExperiment
VignetteBuilder: knitr
lgatto commented 2 months ago

Chiming in here, as I was browsing recent submissions. We typically ask to describe other relevant packages already in Bioconductor in the vignette introduction. In this particular case, it didn't see any mention of/comparison with the Bioconductor tidyomics project.

Yunuuuu commented 2 months ago

Thank you for your comment. I have briefly reviewed the tidyomics project, which primarily utilizes a pipe-based workflow for managing bioinformatic objects. It is important to note that biotidy serves a different purpose compared to tidyomics. biotidy specifically provides a method for extracting a data frame from bioinformatic objects, similar to how the broom::tidy function operates on statistical model objects.

lshep commented 2 months ago

If this is not in the vignette I think it would be useful to add this distinction so it is clearly known.

Yunuuuu commented 2 months ago

Thanks for your suggestion, I have added it into the vignette.

bioc-issue-bot commented 1 month ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: biotidy_0.99.0.tar.gz Linux (Ubuntu 22.04.3 LTS): biotidy_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biotidy to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

Yunuuuu commented 1 month ago

The package utilizes S3 methods and incorporates all dependencies in the Suggests field, leading to a warning. In my opinion, the S3 method is the most suitable approach for this package.

bioc-issue-bot commented 1 month ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

PeteHaitch commented 3 weeks ago

Hi @Yunuuuu ,

Before I go any further with my review, you state in the vignette that "The inspiration for biotidy came from the functionality of the scuttle::makePerCellDF". When then does biotidy::makePerCellDF(mocked_sce) give completely different output to scuttle::makePerCellDF(mocked_sce)?

library(biotidy)
mocked_sce <- mockSCE()
a <- biotidy::makePerCellDF(mocked_sce)
b <- scuttle::makePerCellDF(mocked_sce)
dim(a)
#> [1]  200 2103
dim(b)
#> [1] 200   3
a[1:5, 1:3]
#>         Gene0001 Gene0002 Gene0003
#> Cell001       15        0        0
#> Cell002       23       12       17
#> Cell003        0        0      347
#> Cell004       20       66       26
#> Cell005        2        0       62
b[1:5, 1:3]
#>         Mutation_Status Cell_Cycle Treatment
#> Cell001        positive         G0    treat2
#> Cell002        negative          S    treat1
#> Cell003        negative        G2M    treat1
#> Cell004        positive          S    treat1
#> Cell005        negative         G0    treat1
Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.1 (2024-06-14) #> os macOS Sonoma 14.5 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Melbourne #> date 2024-08-14 #> pandoc 3.2 @ /usr/local/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> abind 1.4-5 2016-07-21 [1] CRAN (R 4.4.0) #> beachmat 2.21.5 2024-07-26 [1] Bioconductor 3.20 (R 4.4.1) #> Biobase 2.65.0 2024-05-04 [1] Bioconductor 3.20 (R 4.4.0) #> BiocGenerics 0.51.0 2024-05-04 [1] Bioconductor 3.20 (R 4.4.0) #> BiocParallel 1.39.0 2024-05-04 [1] Bioconductor 3.20 (R 4.4.0) #> biotidy * 0.99.0 2024-08-14 [1] Bioconductor #> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0) #> codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.1) #> crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.0) #> DelayedArray 0.31.11 2024-08-04 [1] Bioconductor 3.20 (R 4.4.1) #> digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.0) #> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0) #> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0) #> GenomeInfoDb 1.41.1 2024-05-24 [1] Bioconductor 3.20 (R 4.4.0) #> GenomeInfoDbData 1.2.12 2024-03-28 [1] Bioconductor #> GenomicRanges 1.57.1 2024-06-12 [1] Bioconductor 3.20 (R 4.4.1) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) #> httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.0) #> IRanges 2.39.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.1) #> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0) #> knitr 1.48 2024-07-07 [1] CRAN (R 4.4.0) #> lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) #> Matrix 1.7-0 2024-04-26 [1] CRAN (R 4.4.1) #> MatrixGenerics 1.17.0 2024-05-04 [1] Bioconductor 3.20 (R 4.4.0) #> matrixStats 1.3.0 2024-04-11 [1] CRAN (R 4.4.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0) #> Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.0) #> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.0) #> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0) #> rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0) #> S4Arrays 1.5.7 2024-08-06 [1] Bioconductor 3.20 (R 4.4.1) #> S4Vectors 0.43.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.1) #> scuttle 1.15.2 2024-07-17 [1] Bioconductor 3.20 (R 4.4.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) #> SingleCellExperiment 1.27.2 2024-05-24 [1] Bioconductor 3.20 (R 4.4.0) #> SparseArray 1.5.31 2024-08-04 [1] Bioconductor 3.20 (R 4.4.1) #> SummarizedExperiment 1.35.1 2024-06-28 [1] Bioconductor 3.20 (R 4.4.1) #> UCSC.utils 1.1.0 2024-05-04 [1] Bioconductor 3.20 (R 4.4.0) #> withr 3.0.1 2024-07-31 [1] CRAN (R 4.4.0) #> xfun 0.46 2024-07-18 [1] CRAN (R 4.4.0) #> XVector 0.45.0 2024-05-04 [1] Bioconductor 3.20 (R 4.4.0) #> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0) #> zlibbioc 1.51.1 2024-06-05 [1] Bioconductor 3.20 (R 4.4.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
PeteHaitch commented 3 weeks ago

Also, the examples produce way too much output.

library(biotidy)
# First example from ?biotidy::makePerCellDF
mocked_se <- mockSE()
makePerCellDF(mocked_se)
#>         Gene0001 Gene0002 Gene0003 Gene0004 Gene0005 Gene0006 Gene0007 Gene0008
#> Cell001        0       55       34        0       16      129      310     1266
#> 
#> <thousands of lines of output excluded>
#>  [ reached 'max' / getOption("max.print") -- omitted 151 rows ]

This is either causing or masking the cause of the failure when I run R CMD check biotidy_0.99.0.tar.gz on my system (macOS):


cat /Users/peter/GitHub/biotidy/biotidy.Rcheck/00check.log
* using log directory ‘/Users/peter/GitHub/biotidy/biotidy.Rcheck’
* using R version 4.4.1 (2024-06-14)
* using platform: aarch64-apple-darwin20
* R was compiled by
    Apple clang version 14.0.0 (clang-1400.0.29.202)
    GNU Fortran (GCC) 12.2.0
* running under: macOS Sonoma 14.5
* using session charset: UTF-8
* checking for file ‘biotidy/DESCRIPTION’ ... OK
* this is package ‘biotidy’ version ‘0.99.0’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘biotidy’ can be installed ... OK
* checking installed package size ... OK
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking code files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... ERROR
Running examples in ‘biotidy-Ex.R’ failed
The error most likely occurred in:

> ### Name: makePerCellDF
> ### Title: Create a per-cell data.frame
> ### Aliases: makePerCellDF makePerCellDF.SummarizedExperiment
> ###   makePerCellDF.SingleCellExperiment makePerCellDF.ExpressionSet
> ###   makePerCellDF.Seurat
> 
> ### ** Examples
> 
> # SummarizedExperiment method
> mocked_se <- mockSE()
> makePerCellDF(mocked_se)
        Gene0001 Gene0002 Gene0003 Gene0004 Gene0005 Gene0006 Gene0007 Gene0008
Cell001        0       18        0     1116       17     1572     1146       14
<thousands of lines of output excluded>
> # SingleCellExperiment method
> mocked_sce <- mockSCE()
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes ... OK
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE
Status: 1 ERROR
Yunuuuu commented 2 weeks ago

Thank you for taking the time to review my package.

For added convenience, biotidy introduces several enhancements compared to scuttle:

# note: scuttle put column meta data first, then the gene expression value, but `biotidy` put the gene expression value firstly
b[1:5, 1:5]
#>         Mutation_Status Cell_Cycle Treatment Gene0001 Gene0002
#> Cell001        positive         G0    treat2      376        0
#> Cell002        negative         G1    treat1      409       19
#> Cell003        negative         G1    treat2      274        0
#> Cell004        negative        G2M    treat2      463        0
#> Cell005        positive          S    treat2      785        2
setequal(names(a), names(b))
#> [1] TRUE
identical(a, b[names(a)])
#> [1] TRUE

Created on 2024-08-25 with reprex v2.1.0

bioc-issue-bot commented 2 weeks ago

Received a valid push on git.bioconductor.org; starting a build for commit id: c9f21840b203c6013741bff9dbad192f332ba050

bioc-issue-bot commented 2 weeks ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): biotidy_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biotidy to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

PeteHaitch commented 1 week ago

My broader point is that since you are inspired by an existing Bioconductor function, and using the same function name, then as a user has a reasonable expectation that the default output of your new function is identical to that of the old function[^1]. A simple alternative is to use different names for your functions (e.g., SEToDFByCell() and SEToDFByFeature()?) and to acknowledge the inspiration provided by scuttle::makePerCellDF() and scuttle::makerPerFeatureDF() in the documentation.

I'd also note that what you call "enhancements compared to scuttle" are what I'd call a different choice of defaults because it's not clear to me that one is better than the other. In fact, I'd argue producing a data.frame with all features is not a good default because it produces so much output and allocates a lot of memory in doing so.

[^1]: This expectation would be stronger if makePerCellDF() was a generic function, which it is not, but in general I think if you are 'mimicing' an existing function then it should have the same defaults to avoid this confusion.