carmonalab / UCell

Gene set scoring for single-cell data
GNU General Public License v3.0
132 stars 16 forks source link

Question about scoring levels #38

Open jcshuy opened 5 months ago

jcshuy commented 5 months ago

Hi, I'm interested in using UCell to try and view changes in a geneset of interest between two conditions in a Seurat dataset. However, the dataset in particular was sequenced a little late, so some of the proteins that are transcipritonally downregulated between conditions potentially do not show up as actually being downregulated.

We were hoping to use some of the parameters in UCell (along with running SmoothKNN) to try to offset any potential imbalances. In particular, we plan to lower the threshold (which I believe would increase the inclusion of important genes) as well as adjust the w_neg = 1.5 to mark them as more significant. Ideally, we would then show significance in changes between condition and by cluster.

I understand it is a bit of a technical question, but do you know if this would be a viable way to compensate this data?

Thank you in advance!

mass-a commented 5 months ago

Hello, I am not sure I fully understand the question. Do you mean that the gene sets of interest are lowly expressed, so you would like to tune the parameters to increase sensitivity to these lowly expressed genes? In that case, the parameter that may help you is maxRank. This identifies the number of genes included in the ranking for each cell (1500 by default). If your dataset has sufficient depth (how many genes are detected on average per cell?), you may try to increase maxRank so that also genes with low but non-zero expression can contribute to the signature scores. Does that make sense?