With vec2txt we should be able to get a reasonably useful sentence out of the average embeddings of a cluster. This could serve as the cluster label, or perhaps as guidance for summarizing the label.
There are pre-trained models, like for OpenAI's text-embedding-ada-002 and perhaps others. Part of this issue might be helping to pre-train for other supported models in our list.
One could imagine a new API endpoint that takes in an embedding vector and outputs a sentence. We could also have an alternative summarize script that uses this instead (or in conjunction with) summarizing. We currently have a description field per cluster which is not really being used, it could be populated with this or we could add another field.
With vec2txt we should be able to get a reasonably useful sentence out of the average embeddings of a cluster. This could serve as the cluster label, or perhaps as guidance for summarizing the label.
https://github.com/jxmorris12/vec2text/
There are pre-trained models, like for OpenAI's text-embedding-ada-002 and perhaps others. Part of this issue might be helping to pre-train for other supported models in our list.
One could imagine a new API endpoint that takes in an embedding vector and outputs a sentence. We could also have an alternative summarize script that uses this instead (or in conjunction with) summarizing. We currently have a description field per cluster which is not really being used, it could be populated with this or we could add another field.