dgarnitz / vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
https://www.getvectorflow.com/
Apache License 2.0
670 stars 47 forks source link

Add validator hook #64

Closed dgarnitz closed 1 year ago

dgarnitz commented 1 year ago

What

Add a hook to validate chunks before sending them to be embedded. This can be used for deduplication.

Verification

Can see the chunks being printed out inside the mocked validation endpoint, then later the embeddings are received when they are sent back to the same API as the webhook_url

image

Can see that the job succeeds:

image

Timeout

Can see that the timeout will first trigger in the mock validator API:

image

You can see in the testing client that the upload fails:

image

Also the timeout exception being thrown is confirmed in the logs:

image