Open 83years opened 1 year ago
I have a work-around from the tidySingleCellExperiment package.
> sce <- filter(sce, condition != "Control")
> table(sce$condition)
Group1 Control Healthy_Donor Group2
1332146 0 600000 305666
I have noticed that filterSCE() may drop some data if there are some empty entries in the initial metadata spreadsheet.
Can't reproduce this currently... Could you post your metadata table? I.e., metadata(sce)$experiment_info
? As opposed to the tidy
solution, which only subsets based on the colData
, filterSCE
is specifically designed to work within the CATALYST
framework. E.g., it will also assure metadata
are in synch. For example, the experiment_info
will be recomputed after filtering (using internal CATALYST:::.get_ei()
), and cluster_codes
will also be adjusted. The only thing going wrong here, I can image, is something with the sample_id
s, which are needed for the experiment_info
to be set up correctly.
Sorry it's been a while since I encountered the issue. But one thing that seems to work for me is to replace ei()
in filterSCE()
with this modified function, which still relies on sample_ids()
.
ei2 <- function(sce, meta = c("patient_id"))
{
# Make a data frame of sample_ids
ns <- table(sample_id = sample_ids(sce))
df <- as.data.frame(ns)
m <- match(df$sample_id, sce$sample_id)
# Extract metadata
for (i in meta) df[[i]] <- sce[[i]][m]
return(df)
}
Yeah, sure, that works in certain cases- the issue is that match
will return the 1st matching entry, and there is no guarantee that sample_id
s match uniquely to what's in the SCE's colData
. This could result in discrepancies that may go unnoticed. E.g., cells from a given sample_id
will have different cluster_id
s- that's what's "special" with how .get_ei()
constructs the metadata: only variables that match uniquely to sample_id
s will be retained.
Hi Helena,
I hope this finds you well.
I've just updated to the latest version and I think I may have spotted a bug. My dataset is made up of 4 conditions and I want to filter out the "Control" condition before running the clustering and DR.
No matter what I negatively select for I always get the group1 data. The length of the data also remains the same when using
sce1@colData
with non-Group1 data being replaced by NAs.When I positively select
condition == "Control"
I gettable of extent 0
with all the data replaced by NAs.Is this a bug or am I doing something wrong?