AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
2.8k stars 196 forks source link

How to extract embeddings generated by Colbert? #216

Open BlueKiji77 opened 3 months ago

BlueKiji77 commented 3 months ago

I am trying to use RAPTOR hierarchical clustering with Colbertv2 but I need to extract the embeddings generated by the model for the clustering step which involve a Gaussian Mixture Model.

How do I go about doing that? Any help would be appreciated.

kyirong6 commented 3 months ago

@BlueKiji77 hey. looking to do the same thing. Were you able to come up with a solution for this?

BlueKiji77 commented 3 months ago

You would have to use the CollectionEncoder from the colbert repo

On Fri, 14 Jun 2024, 5:15 pm Choenden Kyirong, @.***> wrote:

@BlueKiji77 https://github.com/BlueKiji77 hey. looking to do the same thing. Were you able to come up with a solution for this?

— Reply to this email directly, view it on GitHub https://github.com/bclavie/RAGatouille/issues/216#issuecomment-2168347125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVGT5Q3UPPGDLCCV5UCUI5TZHMJI7AVCNFSM6AAAAABIBFC7NWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYGM2DOMJSGU . You are receiving this because you were mentioned.Message ID: @.***>