buenrostrolab / FigR

Functional Inference of Gene Regulation
https://buenrostrolab.github.io/FigR/
MIT License
38 stars 9 forks source link

Should dimensionality reduction be repeated for each subset of a dataset? #27

Open compbiologist opened 1 year ago

compbiologist commented 1 year ago

Hi,

I am interested in using FigR separately on each cell type in my dataset. Should I repeat Signac LSI dimensionality reduction and repeat of nearest neighbor parameters each time before running FigR or should I use the values obtained on the entire dataset?

vkartha commented 1 year ago

Hi there, we only recommend running FigR per cell type if you have additional source of variability in the data that is of interest (e.g. disease + healthy state cells, multiple conditions etc.). This is because it is correlation based, and in theory, if you have enough cells and multiple cell types, FigR should be able to detect associations where the ATAC/RNA signal is correlated but can still be specific to different groups of cells across the dataset (e.g. cell types). If you absolutely do want to try running it within cell type, then yes would recommend making sure the neighborhood captures that space as well by doing dim reduction / kNN estimation within rather than for the whole dataset that may include other cells.

compbiologist commented 1 year ago

Hi, Thanks for your response. If I run FigR on the entire sample, is there a way that could later decipher cell-type specific TF-DORC associations?

JoreVW commented 1 year ago

Hi @vkartha, so just to be clear, it is best to run FigR on separate conditions in the dataset? So if I have a disease condition and a control condition? Or can you, after running on the complete dataset, still compare the 2 conditions statistically and see whether certain DORC-gene-TF correlations (networks) are significantly more or less present in either of the conditions?