Closed ahy1221 closed 6 years ago
Code?
Sorry for that, this is the code
rowData(sce)$AVE <- calcAverage(sce)
sce <- sce[rowData(sce)$AVE > 1, ]
clusters <- quickCluster(sce, min.size = 15, method = "igraph", min.mean = 1 )
sce <- computeSumFactors(sce, sizes = seq(10, 30, 5),
clusters = clusters, sf.out = F, positive = T)
sce <- normalise(sce)
#==== fitting HVGs
var.fit <- trendVar(sce, parametric = T, block = sce$donor, use.spikes = F)
dec.var <- decomposeVar(sce, fit = var.fit)
dec.var <- dec.var[complete.cases(dec.var),]
dec.var <- dec.var[order(dec.var$bio, decreasing = T),]
hvg <- rownames(dec.var[dec.var$bio > 0.1 & dec.var$FDR < 0.1,])
logcount.mt.list <- lapply(unique(sce$donor), function(d) {
logcounts(sce)[, sce$donor == d]
})
names(logcount.mt.list) <- unique(sce$donor)
mnn.batch.bydonor <- mnnCorrect(logcount.mt.list$D20171109, logcount.mt.list$D20170412,
logcount.mt.list$D20170327, logcount.mt.list$D20170322,
logcount.mt.list$D20170227, logcount.mt.list$D20170222,
k = 20,
subset.row = hvg,
pc.approx = T,
BPPARAM = MulticoreParam(workers = 6 ))
Then I calculated whether there is NaN or Inf value in each cell:
I think this issue may be more about data, so just let me know if you need more detailes
Do you have all-zero cells?
There is no all-zero cells in this dataset, because I did the QC based on the total_features
But there are genes only expressed in one cells. I am not sure whether this is the reason cause that.
logcount.mt.list$D20171109
, etc. entries have NA values.colSums(logcount.mt.list$D20171109[hvg,])
are non-zero.If those two things are fine, then I don't know what the problem is. I'll have to actually look at the data.
Thank you very much. I found that there are many Inf value in the logcounts matrix for some cells. Is there some wrong processing about normalization ?
I'm not sure how you managed to get infinite but non-NA
values from normalize
. Even if you had negative size factors, you should have gotten NA
s. The only way to get infinite normalized values would be to have size factors of exactly zero, which should not be possible.
Also, I don't recommend going below a sizes
of 20. And I don't think that the normalization will be stable if you have fewer than 100 cells per level of clusters
.
Thanks for your replying . You are right that the issue comes up with size factors of zero. I am trying to redo the normalization procedure and it seems that it works fine now.
Regarding the sizes
for deconvolution, if I use default setting, there would be warnings saying that In .computeSumFactors(assay(x, i = assay.type), subset.row = subset.row, : not enough cells in at least one cluster for some 'sizes'
, would that hurts ?
The warnings are trying to protect you by telling you that you don't have enough cells. Setting the sizes
to avoid the warnings doesn't really fix the problem. The alternative is to not pre-cluster, and use all cells for pooling in computeSumFactors
; this will improve precision at the cost of increasing bias.
For an arbitrary data set, I can't say whether this is better or worse than your current approach. It's not a problem I've really encountered. For small data sets, the cells are usually very homogeneous, so I didn't need to pre-cluster, and for large data sets, I've got enough cells such that clustering can be done. If you have the worst-case scenario of few cells from a heterogeneous population, you can imagine that there is very little information that can be meaningfully shared between cells.
Thank you very much! The Inf and NaN value are both caused by low-quality cells. After redoing QC to filter out those cells, all things look fine now. We can close this issue.
I found that after MNN correction , some genes were assign NaN or Inf values. Such values impeded further MNN correction ( in the nested setting ). It seems that such NaN or Inf values are generates by zero counts genes. I am wondering that what is the reasonable way to handle these NaN or Inf values ?