YosefLab / Hotspot

https://hotspot.readthedocs.io/en/latest/
MIT License
95 stars 14 forks source link

NB-GAM fitted values #31

Closed pavsol closed 1 year ago

pavsol commented 1 year ago

Hi, I am running Hotspot on my data from Slingshot and TradeSeq analysis which represent a developmental lineage with the assigned pseudotime values. Hotspot successfully predicted the most informative genes and gene modules, however, the module score plots look quite noisy. Out of curiosity, I tried to run the same analysis using the predicted values extracted from Tradeseq'sfitGAM() function which fits the NB-GAM model for each gene as described in Van den Berge et al.[2019]. The scores now look better (though I have not done any deeper evaluating so far). See the figures below.

My question here is whether it is possible to use the fitted values and whether it does not violate any assumptions of Hotspot.

Best, Pavel

Raw counts used as an input: image image

NB-GAM fitted values: image image

fidelram commented 1 year ago

Any comment from the developers?

pavsol commented 1 year ago

Adding the email reply from @deto and closing the issue:

I'm not too familiar with the details of NB-GAM, but from glancing at the paper, it looks like it's a predictive model for the gene expression after fitting trajectory inference splines? If so, then yes, it would violate Hotspot assumptions as Hotspot assumes that, for a given gene, the detection rate in each cell is independent (or rather, this is the null distribution assumption). Thinking about it intuitively, the NB-GAM model is going to force similar cells to have similar values for a gene's expression - and since this is what Hotspot tests for, probably nearly all genes would show up as significant. If you're just trying to group genes into modules and you've already fit a model like this NB-GAM, you might not need Hotspot - how well does just clustering your modeled genes work?