Implementing Molecular Cross Validation

dburkhardt commented 5 years ago

Is your feature request related to a problem? Please describe. Currently, MAGIC tends to oversmooth data when using automatic t selection and graph fitting parameters.

Describe the solution you'd like Implement Molecular Cross Validation (https://www.biorxiv.org/content/10.1101/786269v1)

Additional context Basic code flow:

Split the counts in each cell into a x1 and x2 (non-overlapping disjoint sets)
Build the graph
1. library size normalize x1
2. PCA
3. Build the graph with a given knn and t
4. Create the diffusion operator, D
Apply the diffusion operator to the library size normalized x1
Multiply D(libnorm(x1)) by the library sizes of x2
Calculate poisson loss
- λ - kln(λ)
Repeat for various k and t

scottgigante commented 5 years ago

@dburkhardt I have some thoughts / materials on this courtesy of @batson and @jamestwebber. Happy for you to actually implement it of course :)

MAGIC Sweep: https://github.com/czbiohub/molecular-cross-validation/blob/master/src/molecular_cross_validation/scripts/magic_sweep.py

jamestwebber commented 5 years ago

Talked to @dburkhardt about this today while he was here. The magic_sweep is the most directly applicable script for this but I'll throw in the newly-added mcv_sweep module and Grid Search vignette notebook as additional resources.

The GridSearchMCV class should work with a little plumbing, but it can't do anything clever with caching and so it'll be a lot slower than a more carefully engineered solution.

KrishnaswamyLab / MAGIC

Implementing Molecular Cross Validation #170