davisidarta / topometry

Systematically learn and evaluate manifolds from high-dimensional data
https://topometry.readthedocs.io/en/latest/
MIT License
95 stars 4 forks source link
clustering data-science data-visualization dimensionality-reduction graph graph-layout hypothesis-generation laplace-beltrami machine-learning manifold-learning scikit-learn single-cell visualization

Latest PyPI version License: MIT Documentation Status Downloads CodeFactor Twitter

TopOMetry - Topologically Optimized geoMetry

TopOMetry is a high-level python library to explore data topology through manifold learning. It is compatible with scikit-learn, meaning most of its operators can be easily pipelined.

Its main idea is to approximate the Laplace-Beltrami Operator (LBO). This is done by learning properly weighted similarity graphs and their Laplacian and Diffusion operators. By definition, the eigenfunctions of these operators describe all underlying data topology in an set of orthonormal eigenbases (classically named the spectral or diffusion components). New topological operators are then learned from such eigenbases and can be used for clustering and graph-layout optimization (visualization).

There are many different ways to computationally approximate the LBO. TopOMetry tests a wide array of possible algorithmic combinations, combines them with existing graph-layout algorithm and scores them aftwerwards. This way, users do not have to choose a fixed method a priori, and can instead decide what works best for each use case. It also includes various utilities for scoring the performance of similarity kernels and dimensional reductions of high-dimensional data. It includes methods for the estimation of intrinsic dimensionalities (global and local), and implements the [Riemann metric]() to qualitatively visualize distortions in 2-D embeddings.

For more information, see the manuscript.

Single-cell analysis

TopOMetry was designed to handle large-scale data matrices containing extreme sample diversity, such as those generated from high-throughput single-cell experiments. It includes wrappers to deal with AnnData objects using scanpy and integrates well with tools in the scverse python suite for single-cell analysis.


Documentation

Instalation can be quickly done with pip:

pip install topometry

Further installation instructions such as optional dependencies, information about the implemented methods tutorials and a detailed API are available at the available at the documentation.


Contributing

Contributions are very welcome! If you're interested in adding a new feature, just let me know in the Issues section.


License

MIT License


Citation

If you use TopOMetry for your work, please cite the manuscript:

@article {Sidarta-Oliveira2022.03.14.484134,
    author = {Sidarta-Oliveira, Davi and Velloso, Licio A},
    title = {A comprehensive dimensional reduction framework to learn single-cell phenotypic topology uncovers T cell diversity},
    elocation-id = {2022.03.14.484134},
    year = {2022},
    doi = {10.1101/2022.03.14.484134},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2022/03/17/2022.03.14.484134},
    eprint = {https://www.biorxiv.org/content/early/2022/03/17/2022.03.14.484134.full.pdf},
    journal = {bioRxiv}
}