Closed lramos7 closed 1 month ago
Below is an excerpt of the error observed
[backend] error: TypeError: fetch failed at node:internal/deps/undici/undici:12618:11 at async createOllamaStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:12:22) at async createOllamaChatStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:61:5) at async ChatOllama._streamResponseChunks (/app/server/node_modules/@langchain/community/dist/chat_models/ollama.cjs:399:30) at async ChatOllama._streamIterator (/app/server/node_modules/@langchain/core/dist/language_models/chat_models.cjs:82:34) at async ChatOllama.transform (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:382:9) at async wrapInputForTracing (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:258:30) at async pipeGeneratorWithSetup (/app/server/node_modules/@langchain/core/dist/utils/stream.cjs:230:19) at async StringOutputParser._transformStreamWithConfig (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:279:26) at async StringOutputParser.transform (/app/server/node_modules/@langchain/core/dist/output_parsers/transform.cjs:36:9)
Ollama fails to run inference from those controls likely because you do not have enough VRAM to run the response. Can you pull Ollama's logs on failure?
Ollama fails to run inference from those controls likely because you do not have enough VRAM to run the response. Can you pull Ollama's logs on failure?
I am not currently working with GPU, I use Nutanix node Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz, with 8 Cores and 90Gb ram.
anythingLLM logs via Docker
2024-09-25T18:25:52.083124177Z [backend] error: TypeError: fetch failed 2024-09-25T18:25:52.083162402Z at node:internal/deps/undici/undici:12618:11 2024-09-25T18:25:52.083166469Z at async createOllamaStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:12:22) 2024-09-25T18:25:52.083169781Z at async createOllamaChatStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:61:5) 2024-09-25T18:25:52.083172845Z at async ChatOllama._streamResponseChunks (/app/server/node_modules/@langchain/community/dist/chat_models/ollama.cjs:399:30) 2024-09-25T18:25:52.083175633Z at async ChatOllama._streamIterator (/app/server/node_modules/@langchain/core/dist/language_models/chat_models.cjs:82:34) 2024-09-25T18:25:52.083178894Z at async ChatOllama.transform (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:382:9) 2024-09-25T18:25:52.083182157Z at async wrapInputForTracing (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:258:30) 2024-09-25T18:25:52.083185181Z at async pipeGeneratorWithSetup (/app/server/node_modules/@langchain/core/dist/utils/stream.cjs:230:19) 2024-09-25T18:25:52.083187918Z at async StringOutputParser._transformStreamWithConfig (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:279:26) 2024-09-25T18:25:52.083190850Z at async StringOutputParser.transform (/app/server/node_modules/@langchain/core/dist/output_parsers/transform.cjs:36:9)
logs via Ollama
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 0: general.architecture str = nomic-bert
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 8: general.file_type u32 = 1
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 15: tokenizer.ggml.model str = bert
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00...
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - type f32: 51 tensors
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - type f16: 61 tensors
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_vocab: special tokens cache size = 5
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_vocab: token to piece cache size = 0.2032 MB
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: format = GGUF V3 (latest)
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: arch = nomic-bert
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: vocab type = WPM
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_vocab = 30522
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_merges = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: vocab_only = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_train = 2048
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd = 768
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_layer = 12
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_head = 12
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_head_kv = 12
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_rot = 64
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_swa = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_k = 64
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_v = 64
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_gqa = 1
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_k_gqa = 768
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_v_gqa = 768
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_eps = 1.0e-12
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_rms_eps = 0.0e+00
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_logit_scale = 0.0e+00
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_ff = 3072
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_expert = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_expert_used = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: causal attn = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: pooling type = 1
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: rope type = 2
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: rope scaling = linear
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: freq_base_train = 1000.0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: freq_scale_train = 1
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_orig_yarn = 2048
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: rope_finetuned = unknown
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_conv = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_inner = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_state = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_rank = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_b_c_rms = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model type = 137M
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model ftype = F16
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model params = 136.73 M
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model size = 260.86 MiB (16.00 BPW)
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: general.name = nomic-embed-text-v1.5
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: BOS token = 101 '[CLS]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: EOS token = 102 '[SEP]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: UNK token = 100 '[UNK]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: SEP token = 102 '[SEP]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: PAD token = 0 '[PAD]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: CLS token = 101 '[CLS]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: MASK token = 103 '[MASK]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: LF token = 0 '[PAD]'
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: max token length = 21
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_tensors: ggml ctx size = 0.05 MiB
Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_tensors: CPU buffer size = 260.86 MiB
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: n_ctx = 8192
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: n_batch = 512
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: n_ubatch = 512
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: flash_attn = 0
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: freq_base = 1000.0
Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: freq_scale = 1
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.027-03:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.027-03:00 level=DEBUG source=server.go:632 msg="model load progress 1.00"
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_kv_cache_init: CPU KV buffer size = 288.00 MiB
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: KV self size = 288.00 MiB, K (f16): 144.00 MiB, V (f16): 144.00 MiB
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: CPU output buffer size = 0.00 MiB
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: CPU compute buffer size = 23.00 MiB
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: graph nodes = 453
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: graph splits = 1
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [initialize] initializing slots | n_slots=1 tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [initialize] new slot | n_ctx_slot=8192 slot_id=0 tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: INFO [main] model loaded | tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=0 tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.278-03:00 level=INFO source=server.go:626 msg="llama runner started in 0.50 seconds"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.278-03:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=1 tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=2 tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=2 tid="138019934359744" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [update_slots] slot released | n_cache_tokens=105 n_ctx=8192 n_past=105 n_system_tokens=0 slot_id=0 task_id=2 tid="138019934359744" timestamp=1727288450 truncated=false
Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [log_server_request] request | method="POST" params={} path="/embedding" remote_addr="127.0.0.1" remote_port=55334 status=200 tid="138019845703360" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[82676]: [GIN] 2024/09/25 - 15:20:50 | 200 | 696.658073ms | 172.17.0.4 | POST "/api/embeddings"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.463-03:00 level=DEBUG source=sched.go:466 msg="context for request finished"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.463-03:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 duration=5m0s
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.463-03:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 refCount=0
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.801-03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="88.3 GiB" before.free="86.0 GiB" before.free_swap="4.0 GiB" now.total="88.3 GiB" now.free="85.3 GiB" now.free_swap="4.0 GiB"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.834-03:00 level=DEBUG source=sched.go:826 msg="evaluating if CPU model load will fit in available system memory"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.834-03:00 level=DEBUG source=memory.go:103 msg=evaluating library=cpu gpu_count=1 available="[85.3 GiB]"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=sched.go:829 msg="cpu inference mode, model fits in available system memory" model="40.7 GiB" available="85.3 GiB"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=sched.go:217 msg="cpu mode with available system memory or first model, loading"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=INFO source=server.go:103 msg="system memory" total="88.3 GiB" free="85.3 GiB" free_swap="4.0 GiB"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx2/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v11/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v12/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/rocm_v60102/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=memory.go:103 msg=evaluating library=cpu gpu_count=1 available="[85.3 GiB]"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[85.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="40.7 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[40.7 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libggml.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libllama.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libggml.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libllama.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libggml.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libllama.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libggml.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libllama.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libggml.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libllama.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libggml.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libllama.so.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/ollama_llama_server.gz
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx2/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v11/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v12/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/rocm_v60102/ollama_llama_server
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=gpu.go:639 msg="no filter required for library cpu"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3859707938/runners/cpu_avx2/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --embedding --log-disable --verbose --no-mmap --mlock --numa numactl --parallel 4 --port 33761"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=server.go:405 msg=subprocess environment="[PATH=/home/service_gpt/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/tmp/ollama3859707938/runners/cpu_avx2]"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.838-03:00 level=INFO source=sched.go:449 msg="loaded runners" count=2
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.838-03:00 level=INFO source=server.go:587 msg="waiting for llama runner to start responding"
Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.838-03:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error"
Sep 25 15:20:50 biogpt01 ollama[132011]: INFO [main] build info | build=10 commit="d7b6049" tid="131633349513408" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[132011]: INFO [main] system info | n_threads=8 n_threads_batch=8 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="131633349513408" timestamp=1727288450 total_threads=8
Sep 25 15:20:50 biogpt01 ollama[132011]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="33761" tid="131633349513408" timestamp=1727288450
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest))
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 0: general.architecture str = llama
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 1: general.type str = model
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 3: general.finetune str = Instruct
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 5: general.size_label str = 70B
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 6: general.license str = llama3.1
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 9: llama.block_count u32 = 80
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 10: llama.context_length u32 = 131072
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 11: llama.embedding_length u32 = 8192
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 13: llama.attention.head_count u32 = 64
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 17: general.file_type u32 = 2
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 28: general.quantization_version u32 = 2
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - type f32: 162 tensors
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - type q4_0: 561 tensors
Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - type q6_K: 1 tensors
Sep 25 15:20:51 biogpt01 ollama[82676]: time=2024-09-25T15:20:51.089-03:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model"
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_vocab: special tokens cache size = 256
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_vocab: token to piece cache size = 0.7999 MB
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: format = GGUF V3 (latest)
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: arch = llama
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: vocab type = BPE
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_vocab = 128256
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_merges = 280147
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: vocab_only = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_train = 131072
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd = 8192
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_layer = 80
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_head = 64
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_head_kv = 8
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_rot = 128
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_swa = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_k = 128
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_v = 128
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_gqa = 8
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_k_gqa = 1024
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_v_gqa = 1024
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_eps = 0.0e+00
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_clamp_kqv = 0.0e+00
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_logit_scale = 0.0e+00
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_ff = 28672
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_expert = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_expert_used = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: causal attn = 1
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: pooling type = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: rope type = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: rope scaling = linear
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: freq_base_train = 500000.0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: freq_scale_train = 1
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_orig_yarn = 131072
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: rope_finetuned = unknown
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_conv = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_inner = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_state = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_rank = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_b_c_rms = 0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model type = 70B
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model ftype = Q4_0
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model params = 70.55 B
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model size = 37.22 GiB (4.53 BPW)
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: LF token = 128 'Ä'
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: max token length = 256
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_tensors: ggml ctx size = 0.34 MiB
Sep 25 15:20:51 biogpt01 ollama[82676]: warning: failed to mlock 39961894912-byte buffer (after previously locking 0 bytes): Cannot allocate memory
Sep 25 15:20:51 biogpt01 ollama[82676]: Try increasing RLIMIT_MEMLOCK ('ulimit -l' as root).
Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_tensors: CPU buffer size = 38110.63 MiB
Sep 25 15:20:51 biogpt01 ollama[82676]: time=2024-09-25T15:20:51.842-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.01"
Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.093-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.04"
Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.344-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.05"
Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.595-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.07"
Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.846-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.08"
Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.096-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.09"
Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.347-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.11"
Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.598-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.12"
Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.849-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.13"
Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.100-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.15"
Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.351-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.17"
Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.602-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.18"
Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.852-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.19"
Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.103-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.21"
Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.354-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.22"
Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.605-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.23"
Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.856-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.25"
Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.107-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.26"
Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.357-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.28"
Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.608-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.29"
Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.859-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.30"
Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.110-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.32"
Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.361-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.33"
Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.611-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.34"
Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.862-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.36"
Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.113-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.37"
Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.364-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.39"
Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.615-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.40"
Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.865-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.41"
Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.116-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.43"
Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.367-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.44"
Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.618-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.45"
Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.868-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.47"
Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.119-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.48"
Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.370-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.50"
Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.621-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.51"
Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.872-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.52"
Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.122-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.54"
Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.373-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.55"
Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.624-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.57"
Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.875-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.58"
Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.126-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.59"
Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.377-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.61"
Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.627-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.62"
Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.878-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.64"
Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.129-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.65"
Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.380-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.66"
Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.631-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.68"
Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.882-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.69"
Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.132-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.70"
Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.383-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.72"
Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.634-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.73"
Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.885-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.75"
Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.136-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.76"
Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.386-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.78"
Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.637-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.79"
Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.888-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.80"
Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.139-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.82"
Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.389-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.83"
Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.640-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.85"
Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.891-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.86"
Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.142-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.87"
Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.393-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.88"
Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.643-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.90"
Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.894-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.92"
Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.145-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.93"
Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.396-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.94"
Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.647-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.96"
Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.898-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.97"
Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.148-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.98"
Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.399-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.99"
Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: n_ctx = 8192
Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: n_batch = 512
Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: n_ubatch = 512
Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: flash_attn = 0
Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: freq_base = 500000.0
Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: freq_scale = 1
Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.649-03:00 level=DEBUG source=server.go:632 msg="model load progress 1.00"
Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.901-03:00 level=DEBUG source=server.go:635 msg="model load completed, waiting for server to become available" status="llm server loading model"
Sep 25 15:21:10 biogpt01 ollama[82676]: llama_kv_cache_init: CPU KV buffer size = 2560.00 MiB
Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: CPU output buffer size = 2.08 MiB
Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: CPU compute buffer size = 1104.01 MiB
Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: graph nodes = 2566
Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: graph splits = 1
Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] initializing slots | n_slots=4 tid="131633349513408" timestamp=1727288471
Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="131633349513408" timestamp=1727288471
Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="131633349513408" timestamp=1727288471
Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="131633349513408" timestamp=1727288471
Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="131633349513408" timestamp=1727288471
Sep 25 15:21:11 biogpt01 ollama[132011]: INFO [main] model loaded | tid="131633349513408" timestamp=1727288471
Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="131633349513408" timestamp=1727288471
Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=0 tid="131633349513408" timestamp=1727288472
Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.043-03:00 level=INFO source=server.go:626 msg="llama runner started in 21.20 seconds"
Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.043-03:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574
Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="131633349513408" timestamp=1727288472
Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=37294 status=200 tid="131633229334208" timestamp=1727288472
Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.055-03:00 level=DEBUG source=prompt.go:51 msg="truncating input messages which exceed context length" truncated=2
Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.056-03:00 level=DEBUG source=routes.go:1417 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nDada a conversa a seguir, o contexto relevante e uma pergunta de acompanhamento, responda à pergunta atual que o usuário está fazendo. Retorne apenas sua resposta à pergunta com as informações acima, seguindo as instruções do usuário, conforme necessário.\nContext:\n[CONTEXT 0]:\n
What are the specs of the machine running ollama? It looks like the content being passed is also more than what the model allows, which will cause issues in performance mode because it will do maximum VRAM allocation for the model - which for 70B is a lot
What are the specs of the machine running ollama? It looks like the content being passed is also more than what the model allows, which will cause issues in performance mode because it will do maximum VRAM allocation for the model - which for 70B is a lot
Nutanix node Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz, with 8 Cores and 90Gb ram.
Ollama and AnythingLLM are on the same host. I installed Ollama using the standard installation and Anything in the docker container.
You should swap to the 13B model. Trying to run the 70B model on a machine with no GPU is 100% the main issue here - even though your RAM is sufficient to load the model, this is amplified even more by "Performance mode" since this will take your long prompt and then try to fit all of this into RAM.
If you are going CPU inferencing you should, typically, use more modest models like 13B or 8B. Does this issue persist if you use a smaller model? It should function fine with smaller model sizes.
How are you running AnythingLLM?
Docker (local)
What happened?
When I change the values of these parameters, I start receiving errors. (Max Context Snippets, Document similarity threshold, Performance Mode errors when changing them).
I use Ollama 3.1 70B locally on the host and on this same host I have Anything running in docker.
If I change the Max context snippets, with any value above 4, I start receiving a (Fetch Failed) error. The same happens if I change the Document similarity threshold to High score >.75.
If I change the Performance Mode from Base to Maximum, the same error occurs.
Leaving the default values, everything works normally.
### what did you expect to happen?
I would like to customize the values, according to my needs and not be stuck with the default values. I need these customizations to check how the solution will behave for my use case.
I was very excited about AnythingLLM, but realizing that small changes like these make it bug, I feel frustrated and insecure at the moment.
Are there known steps to reproduce?
No response