argilla-io / argilla-plugins

🔌 Open-source plugins for with practical features for Argilla using listeners.
Apache License 2.0
6 stars 2 forks source link

annotate record based on `KNN vector similarity` with annotated data #11

Open davidberenstein1957 opened 1 year ago

davidberenstein1957 commented 1 year ago

Normally KNN is used for clustering. Assuming un-annotated samples have a high similarity with newly annotated examples, it might be interesting to label them based on this.

MVP

from argilla_plugins.programmatic_labelling import knn

knn(name="dataset", sim_threshold=0.9)
knn.start()

Stretch filtering variables like query could be added to limit the sync.

davidberenstein1957 commented 1 year ago

This has some overlap with the extend_matrix reasoning.

davidberenstein1957 commented 1 year ago

top_k most similar is already possible with rg.load()