enjalot / latent-scope

A scientific instrument for investigating latent spaces
MIT License
571 stars 19 forks source link

Compare two UMAPs interactively #27

Open enjalot opened 8 months ago

enjalot commented 8 months ago

I've begun working on a compare page that allows you to choose 2 umaps and visualize the difference, as well as transition between which one is shown

Screenshot 2024-03-01 at 1 34 50 PM

We want to see the biggest changes between UMAPs because that hints at the biggest changes between what embeddings encoded.

Right now this is using an absolute measure of how far points move, which is fast and straightforward to implement. It is actually informative when you are comparing aligned UMAPs, but perhaps a more ideal implementation would look for relative changes in distance. So if a point moves far but it doesn't move that far from its neighbors (because they all moved with it) then it shouldn't be highlighted.

I also want to improve the UI for inspecting the data while doing this comparison. It's nice that you can select a point and follow it as you flip between the umaps, but we should have a tooltip that shows the text on hover for quickly inspecting neighbors. It would be nice to see a table of the data points that travelled the furthest sorted by distance. The table view to is also not very well made as tracked in #6

enjalot commented 8 months ago

I've thought of another view I want for comparing two umaps: 2 side by side plots that are linked interactively.

I'd really like to be able to click on a point in one UMAP and then see the list of its 10 nearest neighbors below. Then I'd like to see the nearest neighbors of that same point in the 2nd UMAP (as well as some indication of where the original 10 nearest neighbors landed).

This would be a way to investigate shifts at a very fine-grained level but still giving some enhanced intuition of how concepts move.

enjalot commented 1 month ago

A major consideration for this feature is that directly comparing UMAPs may be very misleading unless the UMAPs were created via AlignedUMAP. We should limit the choices to only those UMAPs that share an alignment, which would also make the UI for choosing which UMAPs to compare easier.