biovault / nptsne

nptsne is a numpy compatible python binary package that offers a number of APIs for fast tSNE calculation.
Apache License 2.0
32 stars 2 forks source link

Support non-Euclidean distance for TextureTsne #11

Open VPetukhov opened 3 years ago

VPetukhov commented 3 years ago

Hi, I work with high-dimensional data (hundreds of dimensions), and it requires using cosine distance. However, TextureTsne has Euclidean distance hardcoded. Could you please make it configurable?

bldrvnlw commented 3 years ago

Hi Viktor - thanks for your input. Yes it's possible to add this as an option. It's not a complex change and something we support in other software. I'm busy on other projects just now but I'll see if I can fit it in this week or next.

VPetukhov commented 3 years ago

That would be great, thank you so much!

bldrvnlw commented 3 years ago

Hi Viktor due to some other work I haven't done the full release of the nptsne with the extra metrics yet but an alpha of the python wheels are available for windows at https://test.pypi.org/project/nptsne/1.2.0a1.dev27/#files if you wish to try it.

Apart from the additions shown below the interfaces are identical to nptsne 1.1.0.

The following documentation for the metrics features is in the docstrings:

` KnnAlgorithm.get_supported_metrics()

        Get a dict containing KnnDistanceMetric values supported by the KnnAlgorithm.

        Parameters
        ----------
        knn_lib : :class:`KnnAlgorithm`
            The algorithm being queried.

        Example
        -------
        Each algorithm has different support. See the tests below.

        >>> import nptsne
        >>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Flann)
        >>> for i in support.items():
        ...     print(i[0])
        Euclidean
        >>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.Annoy)
        >>> for i in support.items():
        ...     print(i[0])
        Cosine
        Dot
        Euclidean
        Manhattan
        >>> support = nptsne.KnnAlgorithm.get_supported_metrics(nptsne.KnnAlgorithm.HNSW)
        >>> for i in support.items():
        ...     print(i[0])
        Euclidean
        Inner Product
        >>> support["Euclidean"] is nptsne.KnnDistanceMetric.Euclidean
        True

` and creation of a tsne object with the new metrics

` Examples

        Create an TextureTsneExtended wrapper

        >>> import nptsne
        >>> tsne = nptsne.TextureTsneExtended(verbose=True, num_target_dimensions=2, perplexity=35, knn_algorithm=nptsne.KnnAlgorithm.Annoy)
        >>> tsne.verbose
        True
        >>> tsne.num_target_dimensions
        2
        >>> tsne.perplexity
        35
        >>> tsne.knn_algorithm == nptsne.KnnAlgorithm.Annoy
        True`
VPetukhov commented 3 years ago

Thank you @bldrvnlw ! I'm actually using linux, but perhaps installing it from sources should not be hard. Will test it when I have some time!

bldrvnlw commented 3 years ago

Hi again @VPetukhov - testpypi now includes the linux (& macos) wheels https://test.pypi.org/project/nptsne/1.2.0a1.dev66/#files

VPetukhov commented 3 years ago

Hi @bldrvnlw , I tested cosine distance, and that works great! Plus, 10 times faster then UMAP. Thank you so much! Should we close the issue?