TheScienceMuseum / heritage-connector

Heritage Connector: Transforming text into data to extract meaning and make connections
https://www.sciencemuseumgroup.org.uk/projects/heritage-connector
MIT License
21 stars 3 forks source link

run clustering on KG embeddings #352

Closed kdutia closed 2 years ago

kdutia commented 3 years ago
kdutia commented 2 years ago

Closing this. An analysis of cluster assignments vs curators' category assignments would be interesting later on, if there's time and it made an interesting write-up

kdutia commented 2 years ago

After further tweaking the min_cluster_size and min_samples parameters (see docs), there seems to be no 'better' version of clustering. I think that this highlights we need a more use case driven approach to clustering.

In the future it may be useful to try this again with a separate set of embeddings where collection categories have been removed as a property, as at the moment the layout roughly represents the structure of the collections.