YosefLab / Hotspot

https://hotspot.readthedocs.io/en/latest/
MIT License
83 stars 13 forks source link

Running hotspot on integrated data #32

Open bbergsneider opened 1 year ago

bbergsneider commented 1 year ago

I am looking to run Hotspot to discover transcriptional modules in a dataset containing integrated data from over 20 samples. All of the Hotspot vignettes I see online only analyze data from a single sample, and I am wondering if Hotspot supports integrated data analysis. If so, which gene counts matrix and model would you recommend using?

My data has been integrated using the standard Seurat integration pipeline, which includes (1) normalizing and identifying variable features for each dataset independently, (2) selecting 2000 variable integration features, and (3) running Seurat's IntegrateData command. I then re-scaled the gene counts for the entire integrated dataset. The integrated data is stored in "data" in the Seurat Object, whereas the scaled data is stored in "scale.data". I then converted the Seurat object to an anndata object, preserving the separation between "data" and "scale.data".

I am currently running hotspot using "data" (aka the original, not re-scaled integrated data) for my gene counts matrix and "normal" for my model type. Is this correct? Would you recommend using a different gene counts matrix or model type? Thank you.

TdzBAS commented 1 month ago

I am also interested to get an answer on this.