Statistical approach for removing unwanted variation from multiple single-cell datasets
Allowance of specifying the colname for "batch" annotation #26

mvfki commented 4 years ago


This is definitely not a bug, but just a slight suggestion for possible improvement.

In function scMerge::scMerge() line 149. it is written like: a column called exactly "batch" must be present. I can easily modify the name of the column I want to use to "batch" before I throw it into your function. But in practice, if I write a wrapper function for your method, I don't think it is really okay to do so that the user would receive unexpected modifications. And if I insist on doing so, I'd have to back up either the original SCE object or the original colname. I mean...why not do a double bracket for column selection with an extra argument allowing people to specify the annotation name?

Sorry for bothering.

kevinwang09 commented 4 years ago

Thanks for the suggestion, I will work on this to incorporate it for the next release.

kevinwang09 commented 4 years ago

Hi @mvfki, this was fixed/implemented a while ago, but here is the updated behaviour of the feature you requested in supplying a batch_name argument rather than forcing a column called "batch". If you are happy, please close the issue.

## Loading example data
data('example_sce', package = 'scMerge')
## Previously computed stably expressed genes
data('segList_ensemblGeneID', package = 'scMerge')
## Running an example data with minimal inputs

example_sce$data_name = example_sce$batch
colData(example_sce) = colData(example_sce)[,"data_name", drop = FALSE]
#> DataFrame with 200 rows and 1 column
#>                         data_name
#>                          <factor>
#> ola_mES_a2i_2_48.counts    batch2
#> ola_mES_2i_2_75.counts     batch2
#> ola_mES_lif_2_68.counts    batch2
#> ola_mES_a2i_2_42.counts    batch2
#> ola_mES_2i_2_66.counts     batch2
#> ...                           ...
#> ola_mES_2i_3_17.counts     batch3
#> ola_mES_lif_3_27.counts    batch3
#> ola_mES_2i_3_21.counts     batch3
#> ola_mES_a2i_3_49.counts    batch3
#> ola_mES_2i_3_54.counts     batch3

sce_mESC <- scMerge(sce_combine = example_sce,
ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
kmeansK = c(3, 3),
assay_name = 'scMerge', batch_name = "data_name")
#> Dimension of the replicates mapping matrix: 
#> [1] 200   3
#> Step 2: Performing RUV normalisation. This will take minutes to hours.
#> scMerge complete!

mvfki commented 4 years ago

Cool. It looks great now. Thanks for working on it!