Open wvdvegte opened 1 year ago
We discussed this at today's meeting.
We'd add an option to order the objects based on leaf-ordered clustering. This will help you find the closest instance.
What I would like to extract in an automated way is, for each column, the row ID with the greatest distance and the value of that distance. For other purposes, it may also be useful to get the row ID with the smallest distance with its value, the average distance in each column, etc.
This makes sense but doesn't belong to this widget. It is not related to (visual representation of) Distance matrix. We could have a separate widget that would be given a matrix and show a table with names of objects (like now in Distance matrix) and the nearest or the farthest (user's choice) object, together with distance. The widget would also output this table in case the user would want to save it. (There's not much else that one could do with this table.) Does this sound OK?
Yes, I think it makes sense. It could be a 'Distance Analysis' or 'Matrix Analysis' widget. How the leaf-ordered clustering will work is not completely clear to me, but I'll give it a try once it's there.
What's your use case? I performed clustering on a corpus of documents based on t-SNE coordinates. For further analysis, I would like to extract, for each cluster, which other cluster is furthest away, i.e., the most dissimilar. To that end, I computed the average t-SNE x and y coordinates for each cluster using Group By, and then computed Distances based on the coordinates. Based on this, I can create a Distance Matrix like this: What I would like to extract in an automated way is, for each column, the row ID with the greatest distance and the value of that distance. For other purposes, it may also be useful to get the row ID with the smallest distance with its value, the average distance in each column, etc. In 'normal' use of the distance matrix, where each row/column represents a data point, it could also be useful to automatically extract for each data point, which other data point is furthest away, how far away it is, etc.
What's your proposed solution? Several options, from most useful to least useful:
Are there any alternative solutions?