amitfrish / scBio

Single Cell Genomics for Enhancing Cell Composition Inference from Bulk Genomics Data
21 stars 7 forks source link

QuantifyCellTypes returns NaN for some cells but not all #1

Closed shayanhoss closed 5 years ago

shayanhoss commented 5 years ago

Hello,

I am trying to test out CPM on deconvoluting one cell type population similar to the example dataset in the package. However when I do this, the predicted values (res_abs$predicted) returns NaN values for some of the single cell reference profiles which results in the value of res_abs$cellTypePredictions being NaN as well.

print(dim(sc[,oneCellType])) 
print(length(scLabels[oneCellType])) 
print(dim(bulkReduced)) 
print(dim(scCellSpace[oneCellType,]))`

[1] 54352    36
[1] 36
[1] 54352    49
[1] 36  2
oneCellType <- which(scLabels == "mesangial cell")
res_abs <- CPM(SCData = sc[,oneCellType],
           SCLabels = scLabels[oneCellType],
           BulkData = bulkReduced,
           cellSpace = scCellSpace[oneCellType,],
           no_cores = 14,
           quantifyTypes = T)

head(res_abs$predicted)

                3m_mesangial cell_Kidney_H16-MAA000752-3_10_M-1-1 3m_mesangial cell_Kidney_M20-MAA000922-3_9_M-1-1 3m_mesangial cell_Kidney_N8-MAA000752-3_10_M-1-1
9.0m_Kidney_A5_384Bulk_Plate2                                                 NaN                                              NaN                                     -0.003052391
21.0m_Kidney_B14_384Bulk_Plate1                                               NaN                                              NaN                                      0.004517717
6.0m_Kidney_B22_384Bulk_Plate1                                                NaN                                              NaN                                      0.015728009
15.0m_Kidney_B4_384Bulk_Plate1                                                NaN                                              NaN                                      0.003875112
21.0m_Kidney_B9_384Bulk_Plate3                                                NaN                                              NaN                                      0.020527078
9.0m_Kidney_C16_384Bulk_Plate1                                                NaN                                              NaN                                     -0.014798537

This wasnt happening when I used CPM on a different data set earlier this week. I thought it might have been an issue of incorrect slicing for the single cell labels but I checked every input and they appear correct.

amitfrish commented 5 years ago

It seems like some of the cells were not selected in each iteration. I would guess that your cell space is distributed quite non-uniformly so you get much denser regions, selecting these cells less often. Usually the minSelection = 5 is enough to select all cells but we didn't test all types of cell spaces so yours might be different. Try to increase this value and see if you get a different outcome. Also, you have only 36 cells which is really low and you might not be able to rely on this cell space at all. For this amount of cells, I would recommend lowering the modelSize to 10 and the neighborhoodSize to 3 or 4. Tell me if this changes anything.

shayanhoss commented 5 years ago

Changing the model and neighborhood sizes did seem to fix that issue. But Im running the algorithm on my subsetted single cells, so Im effectively only feeding it those 36 references for mapping. Im using the UMAP coordinates as my cell space in this example and mostly, the same cell type is well localized and structured in the UMAP space.

amitfrish commented 5 years ago

Well, CPM was designed to handle hundreds and thousands of cells for each cell type so the default parameters are 50 of total cells (from all cell types combined) in each iteration and 10 cells in each neighborhood. Of course that these parameters won't work for one cell type with a total of 36 cells. There is no problem at all in using UMAP.