campbio / decontX

Methods for decontamination of single cell data
MIT License
26 stars 1 forks source link

Bioc review issues #9

Closed yuan-yin-truly closed 9 months ago

yuan-yin-truly commented 10 months ago

Package 'decontX' Review

Thank you for submitting your package to Bioconductor. The package passed check and build. It is in pretty good shape. And it is a complex package. I will leave comments for multiple times. But before my next comments, please try to answer the comments line by line when you are ready for a second review. Code: Note: please consider; Important: must be addressed.

The DESCRIPTION file

  • [x] Important: R version should be no less than 4.3
  • [x] Important: At least three sentences in Description.

The NAMESPACE file

  • [ ] Note: Function names use camelCase or snake_case and do not include ..

    • in line 3 export("decontXcounts<-")
    • in line 13 exportMethods("decontXcounts<-")

General package development

  • [ ] FYI: I am not sure Seurat will fix this issue or not. This is not related to this package. The legacy packages maptools, rgdal, and rgeos, underpinning the sp package, which was just loaded, will retire in October 2023. Please refer to R-spatial evolution reports for details, especially https://r-spatial.org/r/2023/05/15/evolution4.html. It may be desirable to make the sf package available; package maintainers should consider adding sf to Suggests:. The sp package is now running under evolution status 2 (status 2 uses the sf package in place of rgdal)

R code

  • [ ] NOTE: no direct slot access with @ or slot() - accessors implemented and used.

    • In file R/stan_helpers.R:

    • at line 28 found ' val <- stan_vb_output@sim$est'

  • [ ] Important: vapply instead of sapply.

    • In file R/stanmodels.R:

    • at line 10 found 'stanmodels <- sapply(stanmodels, function(model_name) {'

  • [x] Important: No paste in message(), message, stop

    • In file R/celda_functions.R:

    • at line 19 found ' message(paste(..., sep = sep))'

  • [ ] Note: Will it better to use writeLines instead of cat?

    • In file R/celda_functions.R:

    • at line 14 found ' cat(paste(..., "\n", sep = sep),'

  • [ ] NOTE: :: is not suggested in source code unless you can make sure all the packages are imported. Some people think it is better to keep ::. However please note that you need to manully double check the import items when you make any change in the DESCRIPTION file during development. My recommendation is to remove one or two repeats to force the dependency check.
  • [x] NOTE: 1:n is not suggested in source code. Use seq_along or seq.int instead.

    • In file R/plot_decontPro.R:

    • at line 53 found ' breaks = c(1, 5, 10 ^ (1:4))) +'

    • at line 162 found ' breaks = c(1, 5, 10 ^ (1:4))) +'

  • [ ] NOTE: Vectorize: for loops present, try to replace them by *apply funcitons.

    • In file R/celda_functions.R:

    • at line 111 found ' for (i in seq_along(features)) {'

    • In file R/decon.R:

    • at line 184 found ' for (i in batchIndex) {'

    • at line 430 found ' for (bat in batchIndex) {'

    • at line 1340 found ' for (i in seq(K)) {'

    • at line 1343 found ' for (j in setdiff(seq(K), i)) {'

    • at line 1359 found ' for (i in seq(K)) {'

    • In file R/plot_decontPro.R:

    • at line 38 found ' for (i in seq_along(features)) {'

    • at line 140 found ' for (i in seq_along(features)) {'

    • In file R/plot_decontx.R:

    • at line 192 found ' for (i in seq_along(assayName)) {'

    • at line 365 found ' for (i in seq_along(assayName)) {'

    • at line 506 found ' for (i in seq_along(groupClusters)) {'

  • [x] Important: Remove unused code.

    • In file R/decon.R:

    • at line 654 found ' # nG <- nrow(counts)'

    • at line 851 found ' # ll = sum( t(counts) log( (1-conP )geneDist[z,] + conP * conDist[z, ] +'

    • at line 852 found ' # 1e-20 ) ) # when dist_mat are K x G matrices'

    • at line 864 found ' # ll <- sum(t(counts) log(theta t(cellDist) +'

    • at line 865 found ' # (1 - theta) * t(bgDist) + 1e-20))'

  • [ ] Important: Please consider to add drop=FALSE to avoid the reduction of dimension for matrices and arrays. Ignore this if using datatable.

    • In file R/celda_functions.R:

    • at line 97 found ' search <- SummarizedExperiment::rowData(x)[, by]'

    • at line 102 found ' search <- x[, by]'

    • In file R/decon.R:

    • at line 189 found ' tempUMAP[result$runParams$batch == i, ] <- result$estimates[[i]]$UMAP'

    • at line 452 found ' countsBat <- counts[, batch == bat]'

    • at line 453 found ' bgBat <- countsBackground[, batchBackground == bat]'

    • at line 540 found ' estRmat <- estRmat[, !(allCol %in% subCol)]'

    • at line 544 found ' estRmat <- estRmat[, allCol]'

    • at line 853 found ' ll <- sum(Matrix::t(counts) log(theta t(phi)[z, ] +'

    • at line 854 found ' (1 - theta) * t(eta)[z, ] + 1e-20))'

    • at line 866 found ' ll <- sum(t(counts) log(theta t(phi)[cbZ, ] +'

    • at line 867 found ' (1 - theta) * t(eta)[globalZ, ] + 1e-20))'

    • at line 889 found ' logPr <- log(t(phi)[z, ] + 1e-20) + log(theta + 1e-20)'

    • at line 890 found ' logPc <- log(t(eta)[z, ] + 1e-20) + log(1 - theta + 1e-20)'

    • at line 932 found ' logPr <- log(t(phi)[cbZ, ] + 1e-20) + log(theta + 1e-20)'

    • at line 934 found ' log(t(eta)[globalZ, ] + 1e-20) + log(1 - theta + 1e-20)'

    • at line 1342 found ' phi[i, ix] <- max(phi[i, ])'

    • at line 1344 found ' phi[j, ix] <- 0'

    • at line 1351 found ' stats::rmultinom(1, size = rNbyC[i], prob = phi[z[i], ])'

    • at line 1370 found ' stats::rmultinom(1, size = cNbyC[i], prob = eta[, z[i]])'

    • In file R/plot_decontPro.R:

    • at line 42 found ' df <- data.frame(con = counts[feature, ],'

    • at line 43 found ' decon = decontaminated_counts[feature, ])'

    • at line 145 found ' con = counts[feature, ],'

    • at line 146 found ' decon = decontaminated_counts[feature, ],'

    • In file R/plot_decontx.R:

    • at line 60 found ' df <- df[!naIx, ]'

    • at line 212 found ' counts <- x[geneMarkerIndex, ]'

    • at line 383 found ' counts <- x[geneMarkerIndex, ]'

    • at line 468 found ' z <- SummarizedExperiment::colData(x)[, z]'

    • at line 511 found ' x <- x[, !na.ix]'

    • In file R/stan_helpers.R:

    • at line 72 found ' unscaled_rates <- t(r_est[dat$cell_type, ])'

  • [ ] NOTE: Functional programming: code repetition.

    • repetition in .cDCalcEMbgDecontamination and .cDCalcEMDecontamination

    • in .cDCalcEMbgDecontamination

      • line 17: phi <- celda::normalizeCounts(phiUnnormalized, normalize = "proportion",
      • line 18: pseudocountNormalize = 1e-20)
      • line 19: eta <- celda::normalizeCounts(etaUnnormalized, normalize = "proportion",
      • line 20: pseudocountNormalize = 1e-20)
      • line 21: return(list(estRmat = estRmat, theta = theta, phi = phi,
      • line 22: eta = eta, delta = deltaV2))
    • in .cDCalcEMDecontamination

      • line 18: phi <- celda::normalizeCounts(rnGByK, normalize = "proportion",
      • line 19: pseudocountNormalize = 1e-20)
      • line 20: eta <- celda::normalizeCounts(cnGByK, normalize = "proportion",
      • line 21: pseudocountNormalize = 1e-20)
      • line 22: return(list(estRmat = estRmat, theta = theta, phi = phi,
      • line 23: eta = eta, delta = deltaV2))
    • repetition in .colSumByGroupInteger and .colSumByGroupNumeric

    • in .colSumByGroupInteger

      • line 1: group, K)
      • line 2:{
      • line 3: group <- factor(group, levels = seq(K))
      • line 4: res <- .Call("_colSumByGroup", x, group)
      • line 5: return(res)
    • in .colSumByGroupNumeric

      • line 1: group, K)
      • line 2:{
      • line 3: group <- factor(group, levels = seq(K))
      • line 4: res <- .Call("_colSumByGroup_numeric", x, group)
      • line 5: return(res)
    • repetition in plotDecontXMarkerExpression and plotDecontXMarkerPercentage

    • in plotDecontXMarkerExpression

      • line 5: legend <- "none"
      • line 6: temp <- .processPlotDecontXMarkerInupt(x = x, z = z,
      • line 7: markers = markers, groupClusters = groupClusters,
      • line 8: by = by, exactMatch = exactMatch)
      • line 9: x <- temp$x
      • line 10: z <- temp$z
      • line 11: geneMarkerIndex <- temp$geneMarkerIndex
      • line 12: groupClusters <- temp$groupClusters
      • line 13: xlab <- temp$xlab
      • line 14: if (inherits(x, "SingleCellExperiment")) {
      • line 15: df.list <- list()
      • line 16: for (i in seq_along(assayName)) {
      • line 17: counts <- SummarizedExperiment::assay(x[geneMarkerIndex,
      • line 18: ], assayName[i])
      • line 25: if (length(assayName) > 1) {
      • line 26: legend <- "right"
      • line 27: }
      • line 28: }
      • line 29: else {
      • line 30: counts <- x[geneMarkerIndex, ]
    • in plotDecontXMarkerPercentage

      • line 10: }
      • line 11: temp <- .processPlotDecontXMarkerInupt(x = x, z = z,
      • line 12: markers = markers, groupClusters = groupClusters,
      • line 13: by = by, exactMatch = exactMatch)
      • line 14: x <- temp$x
      • line 15: z <- temp$z
      • line 16: geneMarkerIndex <- temp$geneMarkerIndex
      • line 17: geneMarkerCellTypeIndex <- temp$geneMarkerCellTypeIndex
      • line 18: groupClusters <- temp$groupClusters
      • line 19: xlab <- temp$xlab
      • line 20: if (inherits(x, "SingleCellExperiment")) {
      • line 21: df.list <- list()
      • line 22: for (i in seq_along(assayName)) {
      • line 23: counts <- SummarizedExperiment::assay(x[geneMarkerIndex,
      • line 24: ], assayName[i])
      • line 31: if (length(assayName) > 1) {
      • line 32: legend <- "right"
      • line 33: }
      • line 34: }
      • line 35: else {
      • line 36: counts <- x[geneMarkerIndex, ]

Documentation

  • [x] Important: Vignette should have an Installation section.

    • rmd file vignettes/decontPro.Rmd
    • rmd file vignettes/decontX.Rmd
  • [x] Important: Please include Bioconductor installation instructions using BiocManager.

    • rmd file vignettes/decontPro.Rmd
    • rmd file vignettes/decontX.Rmd
  • [ ] Note: Vignette includes motivation for submitting to Bioconductor as part of the abstract/intro of the main vignette.

    • rmd file vignettes/decontPro.Rmd
    • rmd file vignettes/decontX.Rmd
  • [ ] To avoid the '* NOTE: Consider adding runnable examples to man pages that document exported objects.' in BiocCheck, could you please add runnable examples to
    • decontXcounts.Rd
  • [ ] NOTE: Consider adding a NEWS file, so your package news will be included in Bioconductor release announcements.
yuan-yin-truly commented 10 months ago

Rstan seem to have some issue with R 4.3.0. Revisit this later.

(2023-10-14) Resolved

yuan-yin-truly commented 10 months ago

drop=FALSE should not be added in R/plot_decontPro.R, as we need a vector from matrix subsetting.