Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
22.75k stars 2.3k forks source link

Can not embed document #382

Closed netandreus closed 10 months ago

netandreus commented 10 months ago

Problem

I got an error "Error: 1 documents could not be embedded". For testing purposes I used this document:

a_rose_for_emily.txt

Screenshot 2023-11-15 at 11 02 04

Possible source of problems

Stack

Hardware

My instalaltion

I connected LocalAI for LLM:

Screenshot 2023-11-15 at 11 03 26

and for Embeddings:

Screenshot 2023-11-15 at 11 03 36

After this I test that chat with LLM is working. Next I set local Weaviate as a vector database:

Screenshot 2023-11-15 at 11 08 53

Anything running as docker container with this config.

.env

SERVER_PORT=3001
CACHE_VECTORS="true"
VECTOR_DB="lancedb"
STORAGE_DIR="/app/server/storage"
UID='501'
GID='20'
LLM_PROVIDER='localai'
LOCAL_AI_BASE_PATH='http://localhost:8080/v1'
LOCAL_AI_MODEL_PREF='gpt-3.5-turbo-1106'
LOCAL_AI_MODEL_TOKEN_LIMIT=4096

Versions

Logs

anything

anything  | Failed to vectorize custom-documents/a-rose-for-emily-dfb1fd53-9716-43b2-9179-192fc4fd7779.json
anything  | [TELEMETRY SENT] {
anything  |   event: 'documents_embedded_in_workspace',
anything  |   distinctId: 'a9e35522-1f21-4ede-8302-1ea2e4db3c71',
anything  |   properties: { LLMSelection: 'localai', VectorDbSelection: 'weaviate' }
anything  | }
anything  | Weaviate::AllNamespace Error: usage error (401): {"code":401,"message":"oidc auth is not configured, please try another auth scheme or set up weaviate with OIDC configured"}
anything  |     at /app/server/node_modules/weaviate-ts-client/dist/index.js:1:9023
anything  |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
anything  |     at async Object.allNamespaces (/app/server/utils/vectorDbProviders/weaviate/index.js:115:32)
anything  |     at async Object.totalVectors (/app/server/utils/vectorDbProviders/weaviate/index.js:36:29)
anything  |     at async /app/server/endpoints/system.js:180:27
anything  | Adding new vectorized document into namespace test_workspace
anything  | Chunks created from document: 28
anything  | Error: Could not embed document chunks! This document will not be recorded.
anything  |     at Object.addDocumentToNamespace (/app/server/utils/vectorDbProviders/weaviate/index.js:277:15)
anything  |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
anything  |     at async Object.addDocuments (/app/server/models/documents.js:56:26)
anything  |     at async /app/server/endpoints/workspaces.js:128:33
anything  | addDocumentToNamespace Could not embed document chunks! This document will not be recorded.
anything  | Failed to vectorize custom-documents/a-rose-for-emily-dfb1fd53-9716-43b2-9179-192fc4fd7779.json
anything  | [TELEMETRY SENT] {
anything  |   event: 'documents_embedded_in_workspace',
anything  |   distinctId: 'a9e35522-1f21-4ede-8302-1ea2e4db3c71',
anything  |   properties: { LLMSelection: 'localai', VectorDbSelection: 'weaviate' }
anything  | }
anything  | Adding new vectorized document into namespace test_workspace
anything  | Chunks created from document: 28
anything  | Error: Could not embed document chunks! This document will not be recorded.
anything  |     at Object.addDocumentToNamespace (/app/server/utils/vectorDbProviders/weaviate/index.js:277:15)
anything  |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
anything  |     at async Object.addDocuments (/app/server/models/documents.js:56:26)
anything  |     at async /app/server/endpoints/workspaces.js:128:33
anything  | addDocumentToNamespace Could not embed document chunks! This document will not be recorded.
anything  | Failed to vectorize custom-documents/a-rose-for-emily-dfb1fd53-9716-43b2-9179-192fc4fd7779.json
anything  | [TELEMETRY SENT] {
anything  |   event: 'documents_embedded_in_workspace',
anything  |   distinctId: 'a9e35522-1f21-4ede-8302-1ea2e4db3c71',
anything  |   properties: { LLMSelection: 'localai', VectorDbSelection: 'weaviate' }
anything  | }

localAi

11:04AM DBG Request received:
11:04AM DBG Parameter Config: &{PredictionOptions:{Model:falcon-7b-instruct-q4_0.gguf Language: N:0 TopP:0.65 TopK:40 Temperature:0.9 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:gpt-3.5-turbo-1106 F16:true Threads:4 Debug:true Roles:map[] Embeddings:true Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[A Rose for Emily
(The Forum, 1930) WHEN Miss Emily Grierson died, our whole town went to her funeral: the men
...
(here is ALL content from txt file, without separated to chunks)
...
acrid in the nostrils, we saw a long strand of iron-gray hair.] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:1 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2000 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false DraftModel: NDraft:0 Quantization:} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:}}
11:04AM DBG Model already loaded in memory: falcon-7b-instruct-q4_0.gguf
11:04AM DBG Model 'falcon-7b-instruct-q4_0.gguf' already loaded
[127.0.0.1]:64098 500 - POST /v1/embeddings
timothycarambat commented 10 months ago

You are using some form of OIDC auth for Weaviate so AnythingLLM cannot connect to your weaviate instance running locally. Turn off auth for the weaviate docker image totally if you wont be using an API key.

anything | Weaviate::AllNamespace Error: usage error (401): {"code":401,"message":"oidc auth is not configured, please try another auth scheme or set up weaviate with OIDC configured"}