Open Timi7007 opened 1 month ago
I've just tried again using the native "AnythingLLM Embedder" with the following non-functional result:
[backend] info: Adding new vectorized document into namespace buffalo-bills
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 76
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 2 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 3 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 4 of 4
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace lance error: LanceError(IO): Generic LocalFileSystem error: Unable to copy file from /app/server/storage/lancedb/buffalo-bills.lance/_versions/.tmp_1.manifest_4be321d5-b2ec-4add-9c83-2c258c1669b6 to /app/server/storage/lancedb/buffalo-bills.lance/_versions/1.manifest: Function not implemented (os error 38), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-table-0.12.1/src/io/commit.rs:692:54
[backend] error: Failed to vectorize 2021 Buffalo Bills season - Wikipedia.pdf
[backend] info: Adding new vectorized document into namespace buffalo-bills
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 95
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 2 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 3 of 4
[backend] info: [NativeEmbedder] Embedded Chunk 4 of 4
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace Table 'buffalo-bills' was not found
[backend] error: Failed to vectorize 2022 Buffalo Bills season - Wikipedia.pdf
This even seems like separate errors. Please advise.
What does your PDF look like - clearly there is some external or embedded reference to a table that cannot be parsed out of the document
Tried again with a plain .txt, single line, one sentence, no special characters. Still the same issue:
[collector] info: -- Working test.txt --
[collector] info: [SUCCESS]: test.txt converted & ready for embedding.
[backend] info: [CollectorApi] Document test.txt uploaded processed and successfully. It is now available in documents.
[backend] info: [Event Logged] - document_uploaded
[backend] info: Adding new vectorized document into namespace testworkspace
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 1
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 1
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace lance error: LanceError(IO): Generic LocalFileSystem error: Unable to copy file from /app/server/storage/lancedb/testworkspace.lance/_versions/.tmp_1.manifest_404b8afe-8daf-4083-9a62-785ca4d619a9 to /app/server/storage/lancedb/testworkspace.lance/_versions/1.manifest: Function not implemented (os error 38), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-table-0.12.1/src/io/commit.rs:692:54
[backend] error: Failed to vectorize test.txt
[backend] info: [Event Logged] - workspace_documents_added
[backend] info: Adding new vectorized document into namespace testworkspace
[backend] info: [NativeEmbedder] Initialized
[backend] info: [RecursiveSplitter] Will split with {"chunkSize":1000,"chunkOverlap":20}
[backend] info: Chunks created from document: 1
[backend] info: [NativeEmbedder] Embedded Chunk 1 of 1
[backend] info: Inserting vectorized chunks into LanceDB collection.
[backend] error: addDocumentToNamespace Table 'testworkspace' was not found
[backend] error: Failed to vectorize test.txt
[backend] info: [TELEMETRY SENT] {"event":"documents_embedded_in_workspace","properties":{"LLMSelection":"ollama","Embedder":"native","VectorDbSelection":"lancedb","TTSSelection":"native","runtime":"docker"}}
[backend] info: [Event Logged] - workspace_documents_added
This is your issue, its from the lanceDB integration for storing the vectors
[backend] error: addDocumentToNamespace lance error: LanceError(IO): Generic LocalFileSystem error: Unable to copy file from /app/server/storage/lancedb/testworkspace.lance/_versions/.tmp_1.manifest_404b8afe-8daf-4083-9a62-785ca4d619a9 to /app/server/storage/lancedb/testworkspace.lance/_versions/1.manifest: Function not implemented (os error 38), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-table-0.12.1/src/io/commit.rs:692:54
The issue says you are running in Docker, what does the OS you are running on look like and is this using the official image or a custom build?
So the root cause is that causing upserts to fail because tables cannot be written to lance
files.
We have both an x86 and arm
image available. Typically trying to run an incompatible arch on the host via docker causes issues like this. Also when the docker storage is mounted to a network drive this can cause IO operation failures
How are you running AnythingLLM?
Docker (local)
What happened?
I'm afraid I'm doing something wrong, as I can't get documents added using the "Save and embed" dialog. Logs show the following:
The "Table 'buffalo-bills' was not found" error gets forwarded to the frontend.
Are there known steps to reproduce?
Vector-DB is set to LanceDB as per default, embedding provider is Ollama, I've tried different embedding models with the same result.