YingfanWang / PaCMAP

PaCMAP: Large-scale Dimension Reduction Technique Preserving Both Global and Local Structure
Apache License 2.0
534 stars 54 forks source link

Is there a way of providing user-defined distance matrices to PaCMAP #12

Open JoanaMPereira opened 3 years ago

JoanaMPereira commented 3 years ago

First, i want to congratulate you on PaCMAP. I have been using it for some weeks now and it has been working really nicely with my data.

I wanted to ask: is there a way to use PaCMAP with user-input distance matrices? i found a blog post about a R wrapper for PaCMAP, and there the metric keyword is provided as an argument referring that the input matrix is a distance matrix. But i couldn't find how to use that in Python...

Thanks and best wishes Joana

hyhuang00 commented 3 years ago

Thank you for your interest on PaCMAP! Yes, the python version of PaCMAP also supports user-input distance matrices. Specifically, you will be asked to provide the distance matrices for the nearest neighbor sampling. A code snippet is provided here for demonstration: https://github.com/YingfanWang/PaCMAP#how-to-use-user-specified-nearest-neighbor . Let me know if you need more help!

RichieHakim commented 2 years ago

I think this needs to be be reopened.

As can be seen in sklearn's t-sne implementation (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html), it is possible to input ONLY a distance matrix as the input (set metric='precomputed'). This allows a user to specify a fully custom distance matrix as the input to the algorithm. The implementation linked here https://github.com/YingfanWang/PaCMAP#how-to-use-user-specified-nearest-neighbor does not allow for this useful functionality, as it still requires the original feature matrix (X) as input to the .fit_transform method.

hyhuang00 commented 2 years ago

I think this needs to be be reopened.

As can be seen in sklearn's t-sne implementation (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html), it is possible to input ONLY a distance matrix as the input (set metric='precomputed'. This allows a user to specify a fully custom distance matrix as the input to the algorithm. The implementation linked here https://github.com/YingfanWang/PaCMAP#how-to-use-user-specified-nearest-neighbor does not allow for this useful functionality, as it still requires the original feature matrix (X) as input to the .fit_transform method.

In the previous implementation the pair sampling function cannot make use of the custom distance matrix, which leads to the problem you reported. The previous fix was intended for circumventing the nearest neighbor sampling, which takes a considerable time. We have started to refactor many pieces of the code and fix the problem according to your advice. Thank you for your suggestion.

epz0 commented 1 year ago

I've run into this same issue when trying to use PaCMAP. Are there any news in this front? I didn't see anything on this in the release notes for the more recent versions. Thanks for the great work btw!