cnio-bu / beyondcell

Beyondcell is a computational methodology for identifying tumour cell subpopulations with distinct drug responses in single-cell RNA-seq and Spatial Transcriptomics data.
Other
40 stars 4 forks source link

Add no.missing argument to bcScore #113

Open mj-jimenez opened 1 year ago

mj-jimenez commented 1 year ago

Currently, bcScore takes an argument expr.thres that determines the minimum proportion of genes that must be expressed per each cell and signature pair in order to compute a BCS. For those uncomputed scores, bcScore returns NaN. However, it might be interesting to avoid non-missing values (for regression, for example). As a normalized BCS = 0 denotes that we do not know the enrichment direction, I think NaN could be replaced by 0 if no.missing = TRUE. Default value for no.missing should be FALSE to not alter current behavior.

mj-jimenez commented 1 year ago

When computing BCS in datasets containing tumour and TME cells with the pre-loaded drug collections, it is possible that TME cells have a lot of missing values. These drug collections were computed using cancer cell lines, thus the signature genes are biased towards cancer and TME cells might not express those.

If the amount of 0s in the expression matrix is above the expr.thres, beyondcell will assign a NaN value. We may end up with a lot of missing values for a specific cell subtype and imputation may not be possible. For this reason, I would suggest adding a no.missing argument to bcScore. Moreover, if there are a lot of NaN values associated with a group of cells, we may encounter problems in the downstream analysis (see #120). Thus, contrary to what I said previously, I would make no.missing = TRUE by default.

We are solving this behaviour in future releases, but right now I would recommend this approach:

# Beyondcell object with a lot of NaN values
bcobj <- bcScore(sc, SSc, expr.thres = 0.1)

# Filter out spots with a high percentage of NAs                                                                                                                                                                                       
bcobj.filtered <- bcSubset(bcobj, nan.cells = 0.95)                                                                                                                                                                                    

# Replace NAs by 0s                                                                                                                                                                                                                    
bcobj.filtered@normalized[is.na(bcobj.filtered@normalized)] <- 0                                                                                                                                                                       
bcobj.recomputed <- bcRecompute(bcobj.filtered, slot = "normalized")