dgarnitz / vectorflow

VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
https://www.getvectorflow.com/
Apache License 2.0
670 stars 47 forks source link

Support larger file sizes #36

Closed dgarnitz closed 11 months ago

dgarnitz commented 1 year ago

The default gunicorn timeout is 30 seconds. Even extending that to 5 minutes, the system would still timeout with very large files.

The system needs to be rearchitected so that the creation of batches is done by a separate worker or process than the one handling of HTTP requests

dgarnitz commented 11 months ago

Resolved by https://github.com/dgarnitz/vectorflow/pull/82