Open msminhas93 opened 1 year ago
Hi @msminhas93 I would love to see the feature.
We need to fine-tune what we want to achieve. Users that do have the ability to actually get embeddings are able to do so via the python client, hence, they could also use rg.load("dataset", vector=embedding)
. However, it might be useful to allow for deploying an embedding model alongside Argilla to allow for this, like weaviate does here or elasticsearch 8.5 does here.
@frascuchon @dvsrepo IMO, this also aligns with https://github.com/argilla-io/argilla/issues/2150
@msminhas93 what would work best for you?
Thank you for responding! I think the python client is awesome, but for rapid searches based on custom text inputs followed by bulk annotation with few deselections kind of workflow, having UI that supports embedding search would be extremely powerful. Also, domain experts can be nontechnical which would limit their capability to do such queries.
I would imagine this functionality similar to how the new search similar feature works. However, at the backend instead of just storing the embeddings, we store the encoder possibly as some kind of config. This could be as simple as the encoder name or an embed_text function or method (that has to subclass some default base with certain other housekeeping things) that accepts text as input and returns embeddings.
So when we press enter and the embedding search is enabled the callback will run the same logic as the find similar method but with the encoded input text vector.
An additional slider or UI component to filter the similarity score based on the input threshold would be useful too.
@msminhas93 Thanks
An additional slider or UI component to filter the similarity score based on the input threshold would be useful too.
great suggestion! Could you mention that suggestion here too?
@msminhas93 better still could you add a UI specific issue for this and tag @Amelie-V ?
This issue is stale because it has been open for 90 days with no activity.
This issue was closed because it has been inactive for 30 days since being marked as stale.
Revisited some old issues as proposed by Damien Tanner.
Potentially use BM25
as proposed here https://github.com/argilla-io/argilla/issues/2150
This issue is stale because it has been open for 90 days with no activity.
Is your feature request related to a problem? Please describe. Not having the capability via the UI to quickly perform an embedding search based on a text query typed in the search bar is limiting. This capability would make bulk annotation much more flexible since you could search for concepts via a custom text input query rather than a fixed sample from the dataset.
Describe the solution you'd like An option in the UI to allow for embedding search from the text query. This could be as a drop down having two option: