Open onahman opened 1 year ago
Hm. The motivation here is that muscat
is designed for multi-sample multi-group data, so multiple samples (replicates) are expected. Having only one sample will not allow any of the differential testing functions to work (as there'll be only one sample per group). May I ask what the long-game is here? I.e., are you only interested in the simulation perhaps?
Hi, Thank you for your reply! I'll explain. I wish my generated data to have several cell types and have two different samples / groups. But my reference data doesn't have replicates. Rather, the different samples can't be treated as identical (different cancer patients, and cancer is known to be very heterogeneous. I did some batch correction prior the simulation, but the genes that should not be DE have pretty different means - I wish to avoid it / reduce the noise, therefore my decision to use one sample). I am generating the data and later I am interested in testing DE genes but not via your provided framework. So you could say I am mainly interested in the simulation, but I do expect the genes to be differentially expressed. Do you still think I will encounter a problem? I am not sure I completely understand what deferential testing functions will no longer work. Only the ones in your package or in general? Thanks again for your help Ornit
On Fri, Dec 16, 2022 at 2:19 PM Helena L. Crowell @.***> wrote:
Hm. The motivation here is that muscat is designed for multi-sample multi-group data, so multiple samples (replicates) are expected. Having only one sample will not allow any of the differential testing functions to work (as there'll be only one sample per group). May I ask what the long-game is here? I.e., are you only interested in the simulation perhaps?
— Reply to this email directly, view it on GitHub https://github.com/HelenaLC/muscat/issues/113#issuecomment-1354681526, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG6BSDNVXULQMAX2F2G4JMTWNRM5BANCNFSM6AAAAAASNPKJEQ . You are receiving this because you authored the thread.Message ID: @.***>
I am looking at those lines of code from the file PrepSim:
if (!is.null(min_size)) {
n_cells <- table(x$cluster_id, x$sample_id)
n_cells <- .filter_matrix(n_cells, n = min_size)
if (ncol(n_cells) == 1)
stop("Current 'min_size' retains only 1 sample,\nbut",
" mean-dispersion estimation requires at least 2.")
if (verbose)
message(sprintf("- %s/%s subpopulations and %s/%s samples retained.",
nrow(n_cells), nlevels(x$cluster_id), ncol(n_cells),
nlevels(x$sample_id)))
x <- .filter_sce(x, rownames(n_cells), colnames(n_cells))
}
I think the
if (ncol(n_cells) == 1)
might not be required, but a check on each row, that there are still more than 1 cell after filtration. The estimation did work even with only one dataset. Am I missing something? I am trying to avoid batch effects, and would prefer using only one dataset and not multiple if possible.