Running some tests we found out that embedding large documents will cause the system to time out. The timeout for the ingestion lambda is set to 300 seconds. Rather than just increase it, we would like to split large pdfs into few predictable parts and process them in parallel. We're also artificially limiting the concurrency of the processor function to 1. We'd love to remove this once the locking system for LanceDB is out of Beta.
Running some tests we found out that embedding large documents will cause the system to time out. The timeout for the ingestion lambda is set to 300 seconds. Rather than just increase it, we would like to split large pdfs into few predictable parts and process them in parallel. We're also artificially limiting the concurrency of the processor function to 1. We'd love to remove this once the locking system for LanceDB is out of Beta.