dgarnitz / vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
https://www.getvectorflow.com/
Apache License 2.0
670 stars 47 forks source link

Open Source Embeddings with Queue #44

Closed dgarnitz closed 1 year ago

dgarnitz commented 1 year ago

What

refactored the worker and hugging face model embeddings to use a worker and rabbit mq rather than function like an API

Why

More scalable

Verification

Can see that the embedding requests succeed locally in Docker:

image

Can also see the inference occurring in the logs of the model container:

image

And now Rabbit MQ creates a queue for the model-name:

image

Unit Tests

Existing tests pass in the worker: image

Although there are no tests for the HF inference app and the HF pathway inside the worker is not covered by existing tests.