Closed paul-sonnenschein closed 1 month ago
I think this can also affect transcripts pulled from the YouTube data connector, if the video title includes a "+" character.
Hey! I am still experiencing this issue. I am executing exactly the same steps above.
I tried with .TXT and .PDF files.
I get a warning from LanceDB: invalid ENV settings.
Here is the env file:
`# Auto-dump ENV from system call on 17:27:25 GMT+0000 (Coordinated Universal Time)
LLM_PROVIDER='ollama'
EMBEDDING_MODEL_PREF='nomic-embed-text:latest'
OLLAMA_BASE_PATH='http://host.docker.internal:11434'
OLLAMA_MODEL_PREF='llama3'
OLLAMA_MODEL_TOKEN_LIMIT='4096'
EMBEDDING_BASE_PATH='http://host.docker.internal:11434'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH='8192'
STORAGE_DIR='/app/server/storage'
SERVER_PORT='3001'
SIG_KEY=
Please note that Ollama is working properly.
Here's the log from the server: `[backend] info: [EncryptionManager] Loaded existing key & salt for encrypting arbitrary data. [collector] info: -- Working New Text Document (2).txt -- [collector] info: [SUCCESS]: New Text Document (2).txt converted & ready for embedding.
[backend] info: [CollectorApi] Document New Text Document (2).txt uploaded processed and successfully. It is now available in documents. [backend] info: [TELEMETRY SENT] [backend] info: [Event Logged] - document_uploaded [backend] info: Adding new vectorized document into namespace [backend] info: [NativeEmbedder] Initialized [backend] info: [RecursiveSplitter] Will split with [backend] info: Chunks created from document: [backend] info: [NativeEmbedder] Embedded Chunk 1 of 4 [backend] info: [NativeEmbedder] Embedded Chunk 2 of 4 [backend] info: [NativeEmbedder] Embedded Chunk 3 of 4 [backend] info: [NativeEmbedder] Embedded Chunk 4 of 4 [backend] info: Inserting vectorized chunks into LanceDB collection. [backend] error: addDocumentToNamespace [backend] error: Failed to vectorize [backend] info: [TELEMETRY SENT] [backend] info: [Event Logged] - workspace_documents_added`
Please let me know if you find a fix for this! Thanks
How are you running AnythingLLM?
Docker (remote machine)
What happened?
AnythingLLM is setup using Docker compose, using a LocalAI LLM backend, LanceDB vector database and the built-in embedder.
Creating a new workspace containing the letter "
+
" works without any error. However attempting to "Save & Embed" an uploaded file results in an error message instead of a successful embedding. Attempting to embed in a workspace not containing the letter "+
" reports success.Expected behavior:
Either the workspace creating fails with a suitable error message indicating the unsupported character or a supported table name is selected automatically.
Error message + Log output:
Error message:
Log excerpt:
Are there known steps to reproduce?
Test+Test
".