davisidarta / topometry

Systematically learn and evaluate manifolds from high-dimensional data
https://topometry.readthedocs.io/en/latest/
MIT License
93 stars 4 forks source link

Input parameter `X` in Diffusor.transform #1

Closed parashardhapola closed 2 years ago

parashardhapola commented 3 years ago

Hi @davisidarta,

Great work! I think this package will really help expand how we are using the KNN graph structures of single-cell datasets.

I was taking a closer into the Diffusor class and found that you don't actually use the parameter X in Diffusor.transform. Does this mean that the data can only be self-transformed? Or probably I'm missing something?

Best, Parashar

davisidarta commented 3 years ago

Hi @parashardhapola ! Thank you for your interest in TopOMetry and for your kind words! :)

I think this package will really help expand how we are using the KNN graph structures of single-cell datasets.

So I hope! It stores graphs and decomposes them into new dimensionality reduced bases within the TopOGraph object. I'm doing this entirely on my own while on hospital rounds, so I'm sorry the fuzzy and cknn transformers are not so well documented, although you'll find extensive docstrings in the code.

I was taking a closer into the Diffusor class and found that you don't actually use the parameter X in Diffusor.transform. Does this mean that the data can only be self-transformed? Or probably I'm missing something?

That's a consequence of scikit-learn transformers used as a base, but I can actually change it to be optional. The transform() step basically calls an adaptive eigendecomposition of the kernel or the transition matrix, so it only needs these graphs to operate, and not the full data. As you've noticed, this means transformation requires a fitted affinity graph. Although Nystrom's out-of-sample extension is possible, it is not yet implemented, as I'm still considering possible extension methods using landmarks.

TopOMetry is in its very early days, and I'm considering adding new features that others would find useful. Would an extension of the graph-learning process to out-of-sample data be what you're thinking about, so as to do this iteratively in memory batches within Scarf?