aryn-ai / sycamore

🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
https://sycamore.readthedocs.io
Apache License 2.0
309 stars 32 forks source link

Debug embedding task id stability #152

Closed eric-anderson closed 1 week ago

eric-anderson commented 10 months ago

In aryn-opensearch.sh/setup_persistent() the embedding model group id would fail on re-registering and return the existing id. Conversely the embedding_task_id would make a new task id, but that task would fail.

Figure out if we can extract the various IDs directly from opensearch rather than persisting them to a file to avoid opensearch vs file consistency issues.

eric-anderson commented 10 months ago

Issue happens specifically in the sp_register_model_group() function. Right now we abort if we get the error because it means that setup didn't record the persistent id and we can't continue.

eric-anderson commented 1 week ago

We improved this a lot and it seems to work now.