ecosyste-ms / ost

A curated list of open technology projects to sustain a stable climate, energy supply, biodiversity and natural resources, based on data from https://opensustain.tech
https://ost.ecosyste.ms
GNU Affero General Public License v3.0
7 stars 1 forks source link

Clustering of projects using embeddings #100

Open andrew opened 11 months ago

andrew commented 11 months ago

To help automate discovery of new projects, I'd like to experiment with https://github.com/pgvector/pgvector and embeddings from a large language model to cluster projects together.

My plan is:

andrew commented 11 months ago

Working on this branch: https://github.com/ecosyste-ms/ost/tree/pgvector

andrew commented 11 months ago

Experimental output for project recommendations based on the similarity of readmes: https://gist.github.com/andrew/096b4dc209cbcb8701d6acbf812a4244

andrew commented 10 months ago

Also working on a related item in https://github.com/ecosyste-ms/awesome/issues/3 which should help with classification of projects, in theory you'll be able to see which lists a project is most similar to.