using an alternative view of the data, e.g. a nearest neighbors approach, or smoothed data
methods ideas
clustering smoothed marker gene expression could be a very popular path.
using sketching algorithms on reduced dimension data would also work. PCA, UMAP, or maybe an autoencoder would be best. We can save computational time by asking the user to enter their reduced dimension data for us. Ask Zach R for advice about sketching, or see https://github.com/Nanostring-Biostats/InSituType/blob/main/R/geoSketch.R
code organization:
I think we want a convenient function to run up-front to define these cohorts. E.g. "defineLeidenCohorts" or "defineMarkerBasedCohorts". Then its output gets fed directly into insitutype.
good test datasets:
We're looking for datasets where we know what we should we seeing and we have some known challenges.
talk to Dan re: Tregs vs. CD4s
ask Mark re: macropahges in a cancer TMA
ask Evelyn, Claire and Lidan for datasets they know well. We should include a brain dataset.
main goals:
methods ideas
code organization:
good test datasets:
We're looking for datasets where we know what we should we seeing and we have some known challenges.