SydneyBioX / scMerge

Statistical approach for removing unwanted variation from multiple single-cell datasets
https://sydneybiox.github.io/scMerge/
66 stars 13 forks source link

Allowance of specifying the colname for "batch" annotation #26

Closed mvfki closed 4 years ago

mvfki commented 4 years ago

Hi,

This is definitely not a bug, but just a slight suggestion for possible improvement.

In function scMerge::scMerge() line 149. it is written like: a column called exactly "batch" must be present. I can easily modify the name of the column I want to use to "batch" before I throw it into your function. But in practice, if I write a wrapper function for your method, I don't think it is really okay to do so that the user would receive unexpected modifications. And if I insist on doing so, I'd have to back up either the original SCE object or the original colname. I mean...why not do a double bracket for column selection with an extra argument allowing people to specify the annotation name?

Sorry for bothering.

kevinwang09 commented 4 years ago

Thanks for the suggestion, I will work on this to incorporate it for the next release.

kevinwang09 commented 4 years ago

Hi @mvfki, this was fixed/implemented a while ago, but here is the updated behaviour of the feature you requested in supplying a batch_name argument rather than forcing a column called "batch". If you are happy, please close the issue.

library(scMerge)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
#>     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
#>     union, unique, unsplit, which, which.max, which.min
#> Loading required package: S4Vectors
#> Warning: package 'S4Vectors' was built under R version 3.6.3
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:base':
#> 
#>     expand.grid
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Warning: package 'GenomeInfoDb' was built under R version 3.6.3
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: DelayedArray
#> Warning: package 'DelayedArray' was built under R version 3.6.3
#> Loading required package: matrixStats
#> 
#> Attaching package: 'matrixStats'
#> The following objects are masked from 'package:Biobase':
#> 
#>     anyMissing, rowMedians
#> Loading required package: BiocParallel
#> 
#> Attaching package: 'DelayedArray'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
#> The following objects are masked from 'package:base':
#> 
#>     aperm, apply, rowsum
## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs

example_sce$data_name = example_sce$batch
colData(example_sce) = colData(example_sce)[,"data_name", drop = FALSE]
colData(example_sce)
#> DataFrame with 200 rows and 1 column
#>                         data_name
#>                          <factor>
#> ola_mES_a2i_2_48.counts    batch2
#> ola_mES_2i_2_75.counts     batch2
#> ola_mES_lif_2_68.counts    batch2
#> ola_mES_a2i_2_42.counts    batch2
#> ola_mES_2i_2_66.counts     batch2
#> ...                           ...
#> ola_mES_2i_3_17.counts     batch3
#> ola_mES_lif_3_27.counts    batch3
#> ola_mES_2i_3_21.counts     batch3
#> ola_mES_a2i_3_49.counts    batch3
#> ola_mES_2i_3_54.counts     batch3

sce_mESC <- scMerge(sce_combine = example_sce,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
kmeansK = c(3, 3),
assay_name = 'scMerge', batch_name = "data_name")
#> Dimension of the replicates mapping matrix: 
#> [1] 200   3
#> Step 2: Performing RUV normalisation. This will take minutes to hours.
#> scMerge complete!

Created on 2020-05-30 by the reprex package (v0.3.0)

mvfki commented 4 years ago

Cool. It looks great now. Thanks for working on it!