awslabs / project-lakechain

:zap: Cloud-native, AI-powered, document processing pipelines on AWS.
https://awslabs.github.io/project-lakechain/
Apache License 2.0
138 stars 22 forks source link

Feature request: Add connector for FAISS #2

Closed HQarroum closed 5 months ago

HQarroum commented 10 months ago

Use case

We want to add support for a FAISS index for a very low-cost, non-production setup. This would be a new middleware that acts as a storage connector taking embeddings from other middlewares in a pipeline and stores them in a FAISS index in a given storage.

Solution/User Experience

It would be possible to use a S3 bucket as a mean of low-cost storage. The FAISS storage connector would be based on a Lambda compute with a reserved concurrency of 1, loading and writing back the index on the S3 bucket.

Alternative solutions

No response

HQarroum commented 10 months ago

Would it be possible to list the following items to be able to proceed with a prototype ?