dgarnitz / vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
https://www.getvectorflow.com/
Apache License 2.0
670 stars 47 forks source link

Fix rabbit mq bug & add basic retry to worker #92

Closed dgarnitz closed 11 months ago

dgarnitz commented 11 months ago

What

  1. Added in logic to retry the entire message broker connection creation process to extractor, worker & vdb worker. (still needs to be added to hugging_face worker)
  2. Added logic to worker to retry batches that have failed. NOTE that this adds the messages back to the embedding queue, which can cause delays if there are new messages coming in while old ones are being retried. We will likely need to implement a separate retry queue and a worker to move jobs from that queue back onto the original queue
  3. Fixed a bug in /embed caused by the telemetry

Verification

Can see that the request to both /jobs and /embed succeeds

image

see actions for test results