Closed UnluqueUJI closed 7 months ago
i also got this error message
could you guys send me your docker run command? i did not get this error with compose, so ...
EDIT: I maybe see whats wrong here. let me check this
I used the one that's in the readme docker run -d \ --name privategpt \ -p 8080:8080/tcp \ 3x3cut0r/privategpt:latest
i added the rerank settings to the settings.yaml file. it was missing and i think that is why the error occurs. lets wait for the git action to deploy the new image and then try again (maybe you can try in an hour). please let me now if its working then.
Okay perfect, I'll try it as son as posible!
Update: It compiles and runs the container, but can't access the UI From browser headint to localhost:8080 What am I doing wrong? I left the env variables default
Or what is more or less the time it takes to compile and run? Maybe I'm heading to early
Edit: Ok, I see it takes a while.
Now got another error, but this has no information, just says Illegal instruction and shuts down Here's the stack trace
2024-04-12 12:41:23 10:41:23.195 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default'] 2024-04-12 12:41:23 Downloading embedding BAAI/bge-small-en-v1.5 config_sentence_transformers.json: 100%|██████████| 124/124 [00:00<00:00, 598kB/s] 1_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 866kB/s] ?B/s] .gitattributes: 100%|██████████| 1.52k/1.52k [00:00<00:00, 7.31MB/s] modules.json: 100%|██████████| 349/349 [00:00<00:00, 1.69MB/s]/s] config.json: 100%|██████████| 743/743 [00:00<00:00, 2.30MB/s] README.md: 100%|██████████| 94.8k/94.8k [00:00<00:00, 4.48MB/s] special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 14.9kB/s] sentence_bert_config.json: 100%|██████████| 52.0/52.0 [00:00<00:00, 11.2kB/s] vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 3.84MB/s] tokenizer_config.json: 100%|██████████| 366/366 [00:00<00:00, 939kB/s] tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 2.26MB/s]/s] model.onnx: 100%|██████████| 133M/133M [00:23<00:00, 5.72MB/s]/s] model.safetensors: 100%|██████████| 133M/133M [00:23<00:00, 5.68MB/s] pytorch_model.bin: 100%|██████████| 134M/134M [00:23<00:00, 5.74MB/s] Fetching 14 files: 100%|██████████| 14/14 [00:24<00:00, 1.73s/it]/s] 2024-04-12 12:41:47 Embedding model downloaded!00:22<00:01, 5.67MB/s] 2024-04-12 12:41:47 Downloading LLM mistral-7b-instruct-v0.2.Q4_K_M.gguf 2024-04-12 12:46:21 LLM model downloaded!134M [00:23<00:00, 6.58MB/s] 2024-04-12 12:46:21 Downloading tokenizer mistralai/Mistral-7B-Instruct-v0.2 2024-04-12 12:46:23 Tokenizer downloaded! 2024-04-12 12:46:23 Setup done 2024-04-12 12:46:24 privategpt version: 0.5.0 mistral-7b-instruct-v0.2.Q4_K_M.gguf: 100%|██████████| 4.37G/4.37G [04:33<00:00, 16.0MB/s]09380Z [03:35<01:04, 14.3MB/s] tokenizer_config.json: 100%|██████████| 1.46k/1.46k [00:00<00:00, 5.17MB/s] tokenizer.model: 100%|██████████| 493k/493k [00:00<00:00, 11.4MB/s] tokenizer.json: 100%|██████████| 1.80M/1.80M [00:00<00:00, 3.43MB/s] special_tokens_map.json: 100%|██████████| 72.0/72.0 [00:00<00:00, 262kB/s] 2024-04-12 12:46:25 10:46:25.288 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default'] 2024-04-12 12:46:44 10:46:44.529 [INFO ] matplotlib.font_manager - generated new fontManager 2024-04-12 12:46:47 10:46:47.353 [INFO ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=llamacpp 2024-04-12 12:46:47 Illegal instruction
ok strange, i pulled latest image right now and i got no errors at all. i also run with default parameters:
docker run -d \
--name privategpt \
-p 8080:8080/tcp \
3x3cut0r/privategpt:latest
Here's my stack trace:
2024-04-12 16:02:59 14:02:59.427 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default']
.gitattributes: 100%|██████████| 1.52k/1.52k [00:00<00:00, 26.0MB/s]
1_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 1.03MB/s]
modules.json: 100%|██████████| 349/349 [00:00<00:00, 5.03MB/s]s]
config.json: 100%|██████████| 743/743 [00:00<00:00, 14.4MB/s]
README.md: 100%|██████████| 94.8k/94.8k [00:00<00:00, 6.60MB/s]
2024-04-12 16:02:59 Downloading embedding BAAI/bge-small-en-v1.5
config_sentence_transformers.json: 100%|██████████| 124/124 [00:00<00:00, 1.67MB/s]
special_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 2.15MB/s]
tokenizer_config.json: 100%|██████████| 366/366 [00:00<00:00, 5.39MB/s]?B/s]
sentence_bert_config.json: 100%|██████████| 52.0/52.0 [00:00<00:00, 339kB/s]
tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 3.36MB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 1.28MB/s]
pytorch_model.bin: 100%|██████████| 134M/134M [00:13<00:00, 9.86MB/s]
model.onnx: 100%|██████████| 133M/133M [00:14<00:00, 9.42MB/s]95MB/s]
model.safetensors: 100%|██████████| 133M/133M [00:14<00:00, 9.44MB/s]
Fetching 14 files: 100%|██████████| 14/14 [00:14<00:00, 1.05s/it]/s]
2024-04-12 16:03:14 Embedding model downloaded!0:00, 9.53MB/s]
2024-04-12 16:03:14 Downloading LLM mistral-7b-instruct-v0.2.Q4_K_M.gguf
mistral-7b-instruct-v0.2.Q4_K_M.gguf: 100%|██████████| 4.37G/4.37G [02:35<00:00, 28.1MB/s]78500Z [02:02<00:31, 29.1MB/s]
2024-04-12 16:05:50 LLM model downloaded!
2024-04-12 16:05:50 Downloading tokenizer mistralai/Mistral-7B-Instruct-v0.2
tokenizer_config.json: 100%|██████████| 1.46k/1.46k [00:00<00:00, 17.3MB/s]
tokenizer.model: 100%|██████████| 493k/493k [00:00<00:00, 31.2MB/s]
tokenizer.json: 100%|██████████| 1.80M/1.80M [00:00<00:00, 3.81MB/s]
special_tokens_map.json: 100%|██████████| 72.0/72.0 [00:00<00:00, 318kB/s]
2024-04-12 16:05:51 Tokenizer downloaded!
2024-04-12 16:05:51 Setup done
2024-04-12 16:05:52 privategpt version: 0.5.0
2024-04-12 16:05:52 14:05:52.357 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default']
2024-04-12 16:05:56 14:05:56.401 [INFO ] matplotlib.font_manager - generated new fontManager
2024-04-12 16:05:57 14:05:57.266 [INFO ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=llamacpp
2024-04-12 16:05:57 llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /home/worker/app/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf (version GGUF V3 (latest))
2024-04-12 16:05:57 llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2024-04-12 16:05:57 llama_model_loader: - kv 0: general.architecture str = llama
2024-04-12 16:05:57 llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.2
2024-04-12 16:05:57 llama_model_loader: - kv 2: llama.context_length u32 = 32768
2024-04-12 16:05:57 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
2024-04-12 16:05:57 llama_model_loader: - kv 4: llama.block_count u32 = 32
2024-04-12 16:05:57 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
2024-04-12 16:05:57 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
2024-04-12 16:05:57 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
2024-04-12 16:05:57 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
2024-04-12 16:05:57 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
2024-04-12 16:05:57 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000
2024-04-12 16:05:57 llama_model_loader: - kv 11: general.file_type u32 = 15
2024-04-12 16:05:57 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
2024-04-12 16:05:57 llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
2024-04-12 16:05:57 llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
2024-04-12 16:05:57 llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
2024-04-12 16:05:57 llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
2024-04-12 16:05:57 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
2024-04-12 16:05:57 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
2024-04-12 16:05:57 llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
2024-04-12 16:05:57 llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true
2024-04-12 16:05:57 llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
2024-04-12 16:05:57 llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
2024-04-12 16:05:57 llama_model_loader: - kv 23: general.quantization_version u32 = 2
2024-04-12 16:05:57 llama_model_loader: - type f32: 65 tensors
2024-04-12 16:05:57 llama_model_loader: - type q4_K: 193 tensors
2024-04-12 16:05:57 llama_model_loader: - type q6_K: 33 tensors
2024-04-12 16:05:57 llm_load_vocab: special tokens definition check successful ( 259/32000 ).
2024-04-12 16:05:57 llm_load_print_meta: format = GGUF V3 (latest)
2024-04-12 16:05:57 llm_load_print_meta: arch = llama
2024-04-12 16:05:57 llm_load_print_meta: vocab type = SPM
2024-04-12 16:05:57 llm_load_print_meta: n_vocab = 32000
2024-04-12 16:05:57 llm_load_print_meta: n_merges = 0
2024-04-12 16:05:57 llm_load_print_meta: n_ctx_train = 32768
2024-04-12 16:05:57 llm_load_print_meta: n_embd = 4096
2024-04-12 16:05:57 llm_load_print_meta: n_head = 32
2024-04-12 16:05:57 llm_load_print_meta: n_head_kv = 8
2024-04-12 16:05:57 llm_load_print_meta: n_layer = 32
2024-04-12 16:05:57 llm_load_print_meta: n_rot = 128
2024-04-12 16:05:57 llm_load_print_meta: n_embd_head_k = 128
2024-04-12 16:05:57 llm_load_print_meta: n_embd_head_v = 128
2024-04-12 16:05:57 llm_load_print_meta: n_gqa = 4
2024-04-12 16:05:57 llm_load_print_meta: n_embd_k_gqa = 1024
2024-04-12 16:05:57 llm_load_print_meta: n_embd_v_gqa = 1024
2024-04-12 16:05:57 llm_load_print_meta: f_norm_eps = 0.0e+00
2024-04-12 16:05:57 llm_load_print_meta: f_norm_rms_eps = 1.0e-05
2024-04-12 16:05:57 llm_load_print_meta: f_clamp_kqv = 0.0e+00
2024-04-12 16:05:57 llm_load_print_meta: f_max_alibi_bias = 0.0e+00
2024-04-12 16:05:57 llm_load_print_meta: f_logit_scale = 0.0e+00
2024-04-12 16:05:57 llm_load_print_meta: n_ff = 14336
2024-04-12 16:05:57 llm_load_print_meta: n_expert = 0
2024-04-12 16:05:57 llm_load_print_meta: n_expert_used = 0
2024-04-12 16:05:57 llm_load_print_meta: causal attn = 1
2024-04-12 16:05:57 llm_load_print_meta: pooling type = 0
2024-04-12 16:05:57 llm_load_print_meta: rope type = 0
2024-04-12 16:05:57 llm_load_print_meta: rope scaling = linear
2024-04-12 16:05:57 llm_load_print_meta: freq_base_train = 1000000.0
2024-04-12 16:05:57 llm_load_print_meta: freq_scale_train = 1
2024-04-12 16:05:57 llm_load_print_meta: n_yarn_orig_ctx = 32768
2024-04-12 16:05:57 llm_load_print_meta: rope_finetuned = unknown
2024-04-12 16:05:57 llm_load_print_meta: ssm_d_conv = 0
2024-04-12 16:05:57 llm_load_print_meta: ssm_d_inner = 0
2024-04-12 16:05:57 llm_load_print_meta: ssm_d_state = 0
2024-04-12 16:05:57 llm_load_print_meta: ssm_dt_rank = 0
2024-04-12 16:05:57 llm_load_print_meta: model type = 7B
2024-04-12 16:05:57 llm_load_print_meta: model ftype = Q4_K - Medium
2024-04-12 16:05:57 llm_load_print_meta: model params = 7.24 B
2024-04-12 16:05:57 llm_load_print_meta: model size = 4.07 GiB (4.83 BPW)
2024-04-12 16:05:57 llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2
2024-04-12 16:05:57 llm_load_print_meta: BOS token = 1 '<s>'
2024-04-12 16:05:57 llm_load_print_meta: EOS token = 2 '</s>'
2024-04-12 16:05:57 llm_load_print_meta: UNK token = 0 '<unk>'
2024-04-12 16:05:57 llm_load_print_meta: PAD token = 0 '<unk>'
2024-04-12 16:05:57 llm_load_print_meta: LF token = 13 '<0x0A>'
2024-04-12 16:05:57 llm_load_tensors: ggml ctx size = 0.11 MiB
2024-04-12 16:05:57 llm_load_tensors: CPU buffer size = 4165.37 MiB
2024-04-12 16:05:57 .................................................................................................
2024-04-12 16:05:57 llama_new_context_with_model: n_ctx = 3904
2024-04-12 16:05:57 llama_new_context_with_model: n_batch = 512
2024-04-12 16:05:57 llama_new_context_with_model: n_ubatch = 512
2024-04-12 16:05:57 llama_new_context_with_model: freq_base = 1000000.0
2024-04-12 16:05:57 llama_new_context_with_model: freq_scale = 1
2024-04-12 16:05:57 llama_kv_cache_init: CPU KV buffer size = 488.00 MiB
2024-04-12 16:05:57 llama_new_context_with_model: KV self size = 488.00 MiB, K (f16): 244.00 MiB, V (f16): 244.00 MiB
2024-04-12 16:05:57 llama_new_context_with_model: CPU output buffer size = 0.12 MiB
2024-04-12 16:05:57 llama_new_context_with_model: CPU compute buffer size = 283.63 MiB
2024-04-12 16:05:57 llama_new_context_with_model: graph nodes = 1030
2024-04-12 16:05:57 llama_new_context_with_model: graph splits = 1
2024-04-12 16:05:57 AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |
2024-04-12 16:05:57 Model metadata: {'tokenizer.chat_template': "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}", 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '1000000.000000', 'llama.context_length': '32768', 'general.name': 'mistralai_mistral-7b-instruct-v0.2', 'tokenizer.ggml.add_bos_token': 'true', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '15'}
2024-04-12 16:05:57 Guessed chat format: mistral-instruct
2024-04-12 16:05:58 14:05:58.041 [INFO ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=huggingface
2024-04-12 16:05:59 14:05:59.478 [INFO ] llama_index.core.indices.loading - Loading all indices.
2024-04-12 16:05:59 14:05:59.478 [INFO ] private_gpt.components.ingest.ingest_component - Creating a new vector store index
Parsing nodes: 0it [00:00, ?it/s]
Generating embeddings: 0it [00:00, ?it/s]
2024-04-12 16:05:59 14:05:59.556 [INFO ] private_gpt.ui.ui - Mounting the gradio UI, at path=/
2024-04-12 16:05:59 14:05:59.587 [INFO ] uvicorn.error - Started server process [387]
2024-04-12 16:05:59 14:05:59.587 [INFO ] uvicorn.error - Waiting for application startup.
2024-04-12 16:05:59 14:05:59.587 [INFO ] uvicorn.error - Application startup complete.
2024-04-12 16:05:59 14:05:59.587 [INFO ] uvicorn.error - Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
2024-04-12 16:06:00 14:06:00.618 [INFO ] uvicorn.access - 127.0.0.1:56712 - "GET /health HTTP/1.1" 200
2024-04-12 16:06:05 14:06:05.671 [INFO ] uvicorn.access - 127.0.0.1:57512 - "GET /health HTTP/1.1" 200
but illegal instruction indicates that your cpu cannot handle with a few cpu flags i think. i already had this before. this is not necessarily a problem with my image, but can also be an upstream problem. my image is build with cpu only flags:
ARG CMAKE_ARGS='-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR="OpenBLAS" -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF'
on which hardware (cpu) do you guys run the image?
The latest image can now be used on my computer. thanks
Used docker run for deployment but on image starting I got this error. I don´t know exactly why because it´s all in docker environment. Is it supposed to work or there is something that has changed and breaks something?
2024-04-12 09:15:12 07:15:12.820 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default'] 2024-04-12 09:15:12 Traceback (most recent call last): 2024-04-12 09:15:12 File "/home/worker/app/scripts/setup", line 8, in
2024-04-12 09:15:12 from private_gpt.paths import models_path, models_cache_path
2024-04-12 09:15:12 File "/home/worker/app/private_gpt/paths.py", line 4, in
2024-04-12 09:15:12 from private_gpt.settings.settings import settings
2024-04-12 09:15:12 File "/home/worker/app/private_gpt/settings/settings.py", line 434, in
2024-04-12 09:15:12 unsafe_typed_settings = Settings(**unsafe_settings)
2024-04-12 09:15:12 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-04-12 09:15:12 File "/home/worker/app/.venv/lib/python3.11/site-packages/pydantic/main.py", line 164, in init
2024-04-12 09:15:12 pydantic_self.pydantic_validator.validate_python(data, self_instance=__pydantic_self__)
2024-04-12 09:15:12 pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
2024-04-12 09:15:12 rag.rerank
2024-04-12 09:15:12 Field required [type=missing, input_value={'similarity_top_k': 2, 'similarity_value': 0.45}, input_type=dict]
2024-04-12 09:15:12 For further information visit https://errors.pydantic.dev/2.5/v/missing