Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
https://anythingllm.com
MIT License
25.35k stars 2.57k forks source link

[BUG]: Max Context Snippets, Document similarity threshold, Performance Mode errors when changing them #2371

Closed lramos7 closed 1 month ago

lramos7 commented 1 month ago

How are you running AnythingLLM?

Docker (local)

What happened?

When I change the values ​​of these parameters, I start receiving errors. (Max Context Snippets, Document similarity threshold, Performance Mode errors when changing them).

I use Ollama 3.1 70B locally on the host and on this same host I have Anything running in docker.

If I change the Max context snippets, with any value above 4, I start receiving a (Fetch Failed) error. The same happens if I change the Document similarity threshold to High score >.75.

If I change the Performance Mode from Base to Maximum, the same error occurs.

Leaving the default values, everything works normally.

### what did you expect to happen?

I would like to customize the values, according to my needs and not be stuck with the default values. I need these customizations to check how the solution will behave for my use case.

I was very excited about AnythingLLM, but realizing that small changes like these make it bug, I feel frustrated and insecure at the moment.

Are there known steps to reproduce?

No response

lramos7 commented 1 month ago

Below is an excerpt of the error observed

[backend] error: TypeError: fetch failed at node:internal/deps/undici/undici:12618:11 at async createOllamaStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:12:22) at async createOllamaChatStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:61:5) at async ChatOllama._streamResponseChunks (/app/server/node_modules/@langchain/community/dist/chat_models/ollama.cjs:399:30) at async ChatOllama._streamIterator (/app/server/node_modules/@langchain/core/dist/language_models/chat_models.cjs:82:34) at async ChatOllama.transform (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:382:9) at async wrapInputForTracing (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:258:30) at async pipeGeneratorWithSetup (/app/server/node_modules/@langchain/core/dist/utils/stream.cjs:230:19) at async StringOutputParser._transformStreamWithConfig (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:279:26) at async StringOutputParser.transform (/app/server/node_modules/@langchain/core/dist/output_parsers/transform.cjs:36:9)

timothycarambat commented 1 month ago

Ollama fails to run inference from those controls likely because you do not have enough VRAM to run the response. Can you pull Ollama's logs on failure?

lramos7 commented 1 month ago

Ollama fails to run inference from those controls likely because you do not have enough VRAM to run the response. Can you pull Ollama's logs on failure?

I am not currently working with GPU, I use Nutanix node Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz, with 8 Cores and 90Gb ram.

anythingLLM logs via Docker

2024-09-25T18:25:52.083124177Z [backend] error: TypeError: fetch failed 2024-09-25T18:25:52.083162402Z at node:internal/deps/undici/undici:12618:11 2024-09-25T18:25:52.083166469Z at async createOllamaStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:12:22) 2024-09-25T18:25:52.083169781Z at async createOllamaChatStream (/app/server/node_modules/@langchain/community/dist/utils/ollama.cjs:61:5) 2024-09-25T18:25:52.083172845Z at async ChatOllama._streamResponseChunks (/app/server/node_modules/@langchain/community/dist/chat_models/ollama.cjs:399:30) 2024-09-25T18:25:52.083175633Z at async ChatOllama._streamIterator (/app/server/node_modules/@langchain/core/dist/language_models/chat_models.cjs:82:34) 2024-09-25T18:25:52.083178894Z at async ChatOllama.transform (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:382:9) 2024-09-25T18:25:52.083182157Z at async wrapInputForTracing (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:258:30) 2024-09-25T18:25:52.083185181Z at async pipeGeneratorWithSetup (/app/server/node_modules/@langchain/core/dist/utils/stream.cjs:230:19) 2024-09-25T18:25:52.083187918Z at async StringOutputParser._transformStreamWithConfig (/app/server/node_modules/@langchain/core/dist/runnables/base.cjs:279:26) 2024-09-25T18:25:52.083190850Z at async StringOutputParser.transform (/app/server/node_modules/@langchain/core/dist/output_parsers/transform.cjs:36:9)

logs via Ollama

Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 0: general.architecture str = nomic-bert Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 1: general.name str = nomic-embed-text-v1.5 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 2: nomic-bert.block_count u32 = 12 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 3: nomic-bert.context_length u32 = 2048 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 4: nomic-bert.embedding_length u32 = 768 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 5: nomic-bert.feed_forward_length u32 = 3072 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 6: nomic-bert.attention.head_count u32 = 12 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 7: nomic-bert.attention.layer_norm_epsilon f32 = 0.000000 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 8: general.file_type u32 = 1 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 9: nomic-bert.attention.causal bool = false Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 10: nomic-bert.pooling_type u32 = 1 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 11: nomic-bert.rope.freq_base f32 = 1000.000000 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 12: tokenizer.ggml.token_type_count u32 = 2 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 101 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 102 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 15: tokenizer.ggml.model str = bert Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 100 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 20: tokenizer.ggml.seperator_token_id u32 = 102 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 22: tokenizer.ggml.cls_token_id u32 = 101 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - kv 23: tokenizer.ggml.mask_token_id u32 = 103 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - type f32: 51 tensors Sep 25 15:20:49 biogpt01 ollama[82676]: llama_model_loader: - type f16: 61 tensors Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_vocab: special tokens cache size = 5 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_vocab: token to piece cache size = 0.2032 MB Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: format = GGUF V3 (latest) Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: arch = nomic-bert Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: vocab type = WPM Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_vocab = 30522 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_merges = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: vocab_only = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_train = 2048 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd = 768 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_layer = 12 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_head = 12 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_head_kv = 12 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_rot = 64 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_swa = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_k = 64 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_v = 64 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_gqa = 1 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_k_gqa = 768 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_v_gqa = 768 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_eps = 1.0e-12 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_rms_eps = 0.0e+00 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: f_logit_scale = 0.0e+00 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_ff = 3072 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_expert = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_expert_used = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: causal attn = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: pooling type = 1 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: rope type = 2 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: rope scaling = linear Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: freq_base_train = 1000.0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: freq_scale_train = 1 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_orig_yarn = 2048 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: rope_finetuned = unknown Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_conv = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_inner = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_state = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_rank = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model type = 137M Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model ftype = F16 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model params = 136.73 M Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: model size = 260.86 MiB (16.00 BPW) Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: general.name = nomic-embed-text-v1.5 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: BOS token = 101 '[CLS]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: EOS token = 102 '[SEP]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: UNK token = 100 '[UNK]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: SEP token = 102 '[SEP]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: PAD token = 0 '[PAD]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: CLS token = 101 '[CLS]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: MASK token = 103 '[MASK]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: LF token = 0 '[PAD]' Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_print_meta: max token length = 21 Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_tensors: ggml ctx size = 0.05 MiB Sep 25 15:20:49 biogpt01 ollama[82676]: llm_load_tensors: CPU buffer size = 260.86 MiB Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: n_ctx = 8192 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: n_batch = 512 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: n_ubatch = 512 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: flash_attn = 0 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: freq_base = 1000.0 Sep 25 15:20:49 biogpt01 ollama[82676]: llama_new_context_with_model: freq_scale = 1 Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.027-03:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.027-03:00 level=DEBUG source=server.go:632 msg="model load progress 1.00" Sep 25 15:20:50 biogpt01 ollama[82676]: llama_kv_cache_init: CPU KV buffer size = 288.00 MiB Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: KV self size = 288.00 MiB, K (f16): 144.00 MiB, V (f16): 144.00 MiB Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: CPU output buffer size = 0.00 MiB Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: CPU compute buffer size = 23.00 MiB Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: graph nodes = 453 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_new_context_with_model: graph splits = 1 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [initialize] initializing slots | n_slots=1 tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [initialize] new slot | n_ctx_slot=8192 slot_id=0 tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: INFO [main] model loaded | tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=0 tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.278-03:00 level=INFO source=server.go:626 msg="llama runner started in 0.50 seconds" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.278-03:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=1 tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=2 tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=2 tid="138019934359744" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [update_slots] slot released | n_cache_tokens=105 n_ctx=8192 n_past=105 n_system_tokens=0 slot_id=0 task_id=2 tid="138019934359744" timestamp=1727288450 truncated=false Sep 25 15:20:50 biogpt01 ollama[131978]: DEBUG [log_server_request] request | method="POST" params={} path="/embedding" remote_addr="127.0.0.1" remote_port=55334 status=200 tid="138019845703360" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[82676]: [GIN] 2024/09/25 - 15:20:50 | 200 | 696.658073ms | 172.17.0.4 | POST "/api/embeddings" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.463-03:00 level=DEBUG source=sched.go:466 msg="context for request finished" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.463-03:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 duration=5m0s Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.463-03:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 refCount=0 Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.801-03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="88.3 GiB" before.free="86.0 GiB" before.free_swap="4.0 GiB" now.total="88.3 GiB" now.free="85.3 GiB" now.free_swap="4.0 GiB" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.834-03:00 level=DEBUG source=sched.go:826 msg="evaluating if CPU model load will fit in available system memory" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.834-03:00 level=DEBUG source=memory.go:103 msg=evaluating library=cpu gpu_count=1 available="[85.3 GiB]" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=sched.go:829 msg="cpu inference mode, model fits in available system memory" model="40.7 GiB" available="85.3 GiB" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=sched.go:217 msg="cpu mode with available system memory or first model, loading" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=INFO source=server.go:103 msg="system memory" total="88.3 GiB" free="85.3 GiB" free_swap="4.0 GiB" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx2/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v11/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v12/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/rocm_v60102/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.835-03:00 level=DEBUG source=memory.go:103 msg=evaluating library=cpu gpu_count=1 available="[85.3 GiB]" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=INFO source=memory.go:326 msg="offload to cpu" layers.requested=-1 layers.model=81 layers.offload=0 layers.split="" memory.available="[85.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="40.7 GiB" memory.required.partial="0 B" memory.required.kv="2.5 GiB" memory.required.allocations="[40.7 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libggml.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/libllama.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu payload=linux/amd64/cpu/ollama_llama_server.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libggml.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/libllama.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx payload=linux/amd64/cpu_avx/ollama_llama_server.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libggml.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/libllama.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cpu_avx2 payload=linux/amd64/cpu_avx2/ollama_llama_server.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libggml.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/libllama.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v11 payload=linux/amd64/cuda_v11/ollama_llama_server.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libggml.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/libllama.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=cuda_v12 payload=linux/amd64/cuda_v12/ollama_llama_server.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libggml.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/libllama.so.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.836-03:00 level=DEBUG source=common.go:168 msg=extracting runner=rocm_v60102 payload=linux/amd64/rocm_v60102/ollama_llama_server.gz Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cpu_avx2/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v11/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/cuda_v12/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=common.go:294 msg="availableServers : found" file=/tmp/ollama3859707938/runners/rocm_v60102/ollama_llama_server Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=gpu.go:639 msg="no filter required for library cpu" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=INFO source=server.go:388 msg="starting llama server" cmd="/tmp/ollama3859707938/runners/cpu_avx2/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --embedding --log-disable --verbose --no-mmap --mlock --numa numactl --parallel 4 --port 33761" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.837-03:00 level=DEBUG source=server.go:405 msg=subprocess environment="[PATH=/home/service_gpt/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin LD_LIBRARY_PATH=/tmp/ollama3859707938/runners/cpu_avx2]" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.838-03:00 level=INFO source=sched.go:449 msg="loaded runners" count=2 Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.838-03:00 level=INFO source=server.go:587 msg="waiting for llama runner to start responding" Sep 25 15:20:50 biogpt01 ollama[82676]: time=2024-09-25T15:20:50.838-03:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error" Sep 25 15:20:50 biogpt01 ollama[132011]: INFO [main] build info | build=10 commit="d7b6049" tid="131633349513408" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[132011]: INFO [main] system info | n_threads=8 n_threads_batch=8 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="131633349513408" timestamp=1727288450 total_threads=8 Sep 25 15:20:50 biogpt01 ollama[132011]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="33761" tid="131633349513408" timestamp=1727288450 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest)) Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 0: general.architecture str = llama Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 1: general.type str = model Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 3: general.finetune str = Instruct Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 5: general.size_label str = 70B Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 6: general.license str = llama3.1 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 9: llama.block_count u32 = 80 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 10: llama.context_length u32 = 131072 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 11: llama.embedding_length u32 = 8192 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 13: llama.attention.head_count u32 = 64 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 17: general.file_type u32 = 2 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - kv 28: general.quantization_version u32 = 2 Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - type f32: 162 tensors Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - type q4_0: 561 tensors Sep 25 15:20:50 biogpt01 ollama[82676]: llama_model_loader: - type q6_K: 1 tensors Sep 25 15:20:51 biogpt01 ollama[82676]: time=2024-09-25T15:20:51.089-03:00 level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model" Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_vocab: special tokens cache size = 256 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_vocab: token to piece cache size = 0.7999 MB Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: format = GGUF V3 (latest) Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: arch = llama Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: vocab type = BPE Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_vocab = 128256 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_merges = 280147 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: vocab_only = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_train = 131072 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd = 8192 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_layer = 80 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_head = 64 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_head_kv = 8 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_rot = 128 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_swa = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_k = 128 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_head_v = 128 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_gqa = 8 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_k_gqa = 1024 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_embd_v_gqa = 1024 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_eps = 0.0e+00 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: f_logit_scale = 0.0e+00 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_ff = 28672 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_expert = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_expert_used = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: causal attn = 1 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: pooling type = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: rope type = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: rope scaling = linear Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: freq_base_train = 500000.0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: freq_scale_train = 1 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: n_ctx_orig_yarn = 131072 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: rope_finetuned = unknown Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_conv = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_inner = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_d_state = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_rank = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: ssm_dt_b_c_rms = 0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model type = 70B Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model ftype = Q4_0 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model params = 70.55 B Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: EOS token = 128009 '<|eot_id|>' Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: LF token = 128 'Ä' Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: EOT token = 128009 '<|eot_id|>' Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_print_meta: max token length = 256 Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_tensors: ggml ctx size = 0.34 MiB Sep 25 15:20:51 biogpt01 ollama[82676]: warning: failed to mlock 39961894912-byte buffer (after previously locking 0 bytes): Cannot allocate memory Sep 25 15:20:51 biogpt01 ollama[82676]: Try increasing RLIMIT_MEMLOCK ('ulimit -l' as root). Sep 25 15:20:51 biogpt01 ollama[82676]: llm_load_tensors: CPU buffer size = 38110.63 MiB Sep 25 15:20:51 biogpt01 ollama[82676]: time=2024-09-25T15:20:51.842-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.01" Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.093-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.04" Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.344-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.05" Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.595-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.07" Sep 25 15:20:52 biogpt01 ollama[82676]: time=2024-09-25T15:20:52.846-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.08" Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.096-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.09" Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.347-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.11" Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.598-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.12" Sep 25 15:20:53 biogpt01 ollama[82676]: time=2024-09-25T15:20:53.849-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.13" Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.100-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.15" Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.351-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.17" Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.602-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.18" Sep 25 15:20:54 biogpt01 ollama[82676]: time=2024-09-25T15:20:54.852-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.19" Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.103-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.21" Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.354-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.22" Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.605-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.23" Sep 25 15:20:55 biogpt01 ollama[82676]: time=2024-09-25T15:20:55.856-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.25" Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.107-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.26" Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.357-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.28" Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.608-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.29" Sep 25 15:20:56 biogpt01 ollama[82676]: time=2024-09-25T15:20:56.859-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.30" Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.110-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.32" Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.361-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.33" Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.611-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.34" Sep 25 15:20:57 biogpt01 ollama[82676]: time=2024-09-25T15:20:57.862-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.36" Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.113-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.37" Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.364-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.39" Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.615-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.40" Sep 25 15:20:58 biogpt01 ollama[82676]: time=2024-09-25T15:20:58.865-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.41" Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.116-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.43" Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.367-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.44" Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.618-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.45" Sep 25 15:20:59 biogpt01 ollama[82676]: time=2024-09-25T15:20:59.868-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.47" Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.119-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.48" Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.370-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.50" Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.621-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.51" Sep 25 15:21:00 biogpt01 ollama[82676]: time=2024-09-25T15:21:00.872-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.52" Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.122-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.54" Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.373-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.55" Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.624-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.57" Sep 25 15:21:01 biogpt01 ollama[82676]: time=2024-09-25T15:21:01.875-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.58" Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.126-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.59" Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.377-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.61" Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.627-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.62" Sep 25 15:21:02 biogpt01 ollama[82676]: time=2024-09-25T15:21:02.878-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.64" Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.129-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.65" Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.380-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.66" Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.631-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.68" Sep 25 15:21:03 biogpt01 ollama[82676]: time=2024-09-25T15:21:03.882-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.69" Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.132-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.70" Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.383-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.72" Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.634-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.73" Sep 25 15:21:04 biogpt01 ollama[82676]: time=2024-09-25T15:21:04.885-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.75" Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.136-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.76" Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.386-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.78" Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.637-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.79" Sep 25 15:21:05 biogpt01 ollama[82676]: time=2024-09-25T15:21:05.888-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.80" Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.139-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.82" Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.389-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.83" Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.640-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.85" Sep 25 15:21:06 biogpt01 ollama[82676]: time=2024-09-25T15:21:06.891-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.86" Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.142-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.87" Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.393-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.88" Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.643-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.90" Sep 25 15:21:07 biogpt01 ollama[82676]: time=2024-09-25T15:21:07.894-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.92" Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.145-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.93" Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.396-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.94" Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.647-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.96" Sep 25 15:21:08 biogpt01 ollama[82676]: time=2024-09-25T15:21:08.898-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.97" Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.148-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.98" Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.399-03:00 level=DEBUG source=server.go:632 msg="model load progress 0.99" Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: n_ctx = 8192 Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: n_batch = 512 Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: n_ubatch = 512 Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: flash_attn = 0 Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: freq_base = 500000.0 Sep 25 15:21:09 biogpt01 ollama[82676]: llama_new_context_with_model: freq_scale = 1 Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.649-03:00 level=DEBUG source=server.go:632 msg="model load progress 1.00" Sep 25 15:21:09 biogpt01 ollama[82676]: time=2024-09-25T15:21:09.901-03:00 level=DEBUG source=server.go:635 msg="model load completed, waiting for server to become available" status="llm server loading model" Sep 25 15:21:10 biogpt01 ollama[82676]: llama_kv_cache_init: CPU KV buffer size = 2560.00 MiB Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: CPU output buffer size = 2.08 MiB Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: CPU compute buffer size = 1104.01 MiB Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: graph nodes = 2566 Sep 25 15:21:10 biogpt01 ollama[82676]: llama_new_context_with_model: graph splits = 1 Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] initializing slots | n_slots=4 tid="131633349513408" timestamp=1727288471 Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="131633349513408" timestamp=1727288471 Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="131633349513408" timestamp=1727288471 Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="131633349513408" timestamp=1727288471 Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="131633349513408" timestamp=1727288471 Sep 25 15:21:11 biogpt01 ollama[132011]: INFO [main] model loaded | tid="131633349513408" timestamp=1727288471 Sep 25 15:21:11 biogpt01 ollama[132011]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="131633349513408" timestamp=1727288471 Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=0 tid="131633349513408" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.043-03:00 level=INFO source=server.go:626 msg="llama runner started in 21.20 seconds" Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.043-03:00 level=DEBUG source=sched.go:462 msg="finished setting up runner" model=/usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="131633349513408" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=37294 status=200 tid="131633229334208" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.055-03:00 level=DEBUG source=prompt.go:51 msg="truncating input messages which exceed context length" truncated=2 Sep 25 15:21:12 biogpt01 ollama[82676]: time=2024-09-25T15:21:12.056-03:00 level=DEBUG source=routes.go:1417 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nDada a conversa a seguir, o contexto relevante e uma pergunta de acompanhamento, responda à pergunta atual que o usuário está fazendo. Retorne apenas sua resposta à pergunta com as informações acima, seguindo as instruções do usuário, conforme necessário.\nContext:\n[CONTEXT 0]:\n\nsourceDocument: Lei Nº 8.666, de 21 de junho de 1993.pdf\npublished: 9/25/2024, 1:39:33 PM\n\n\nIII - pode ser cumulada com o quantitativo de área decorrente da figura prevista na alínea g do inciso I do caput\ndeste artigo, até o limite previsto no inciso II deste parágrafo. (Incluído pela Lei nº 11.196, de 2005)\nIV – (VETADO) (Incluído pela Lei nº 11.763, de 2008)\n§ 3º Entende-se por investidura, para os fins desta lei, a alienação aos proprietários de imóveis lindeiros de área\nremanescente ou resultante de obra pública, área esta que se tornar inaproveitável isoladamente, por preço nunca\ninferior ao da avaliação e desde que esse não ultrapasse a 50% (cinqüenta por cento) do valor constante da alínea a do\ninciso II do art. 23 desta lei.\n§ 3\no\n Entende-se por investidura, para os fins desta lei: (Redação dada pela Lei nº 9.648, de 1998)\nI - a alienação aos proprietários de imóveis lindeiros de área remanescente ou resultante de obra pública, área\nesta que se tornar inaproveitável isoladamente, por preço nunca inferior ao da avaliação e desde que esse não\nultrapasse a 50% (cinqüenta por cento) do valor constante da alínea \"a\" do inciso II do art. 23 desta lei; \n(Incluído pela Lei nº 9.648, de 1998)\nII - a alienação, aos legítimos possuidores diretos ou, na falta destes, ao Poder Público, de imóveis para fins\nresidenciais construídos em núcleos urbanos anexos a usinas hidrelétricas, desde que considerados dispensáveis na\n[END CONTEXT 0]\n\n[CONTEXT 1]:\n\nsourceDocument: Lei Nº 8.666, de 21 de junho de 1993.pdf\npublished: 9/25/2024, 1:39:33 PM\n\n\n7.546, de 2011)\nI - à quantidade a ser adquirida ou contratada; ou (Incluído pela Lei nº 12.349, de 2010)\nII - ao quantitativo fixado com fundamento no § 7\no\n do art. 23 desta Lei, quando for o caso. (Incluído\npela Lei nº 12.349, de 2010)\n§ 10. A margem de preferência a que se refere o § 6\no\n será estendida aos bens e serviços originários dos\nEstados Partes do Mercado Comum do Sul - Mercosul, após a ratificação do Protocolo de Contratações Públicas do\nMercosul, celebrado em 20 de julho de 2006, e poderá ser estendida, total ou parcialmente, aos bens e serviços\noriginários de outros países, com os quais o Brasil venha assinar acordos sobre compras governamentais. \n(Incluído pela Medida Provisória nº 495, de 2010)\n§ 10. A margem de preferência a que se refere o § 5\no\n poderá ser estendida, total ou parcialmente, aos bens e\nserviços originários dos Estados Partes do Mercado Comum do Sul - Mercosul. (Incluído pela Lei nº 12.349,\nde 2010) (Vide Decreto nº 7.546, de 2011)\n§ 11. Os editais de licitação para a contratação de bens, serviços e obras poderão exigir que o contratado\npromova, em favor da administração pública ou daqueles por ela indicados, medidas de compensação comercial,\nindustrial, tecnológica ou acesso a condições vantajosas de financiamento, cumulativamente ou não, na forma\n[END CONTEXT 1]\n\n[CONTEXT 2]:\n\nsourceDocument: Lei Nº 8.666, de 21 de junho de 1993.pdf\npublished: 9/25/2024, 1:39:33 PM\n\n\nhabilitação. (Redação dada pela Lei nº 9.648, de 1998)\n§ 3\no\n A documentação referida neste artigo poderá ser substituída por registro cadastral emitido por órgão ou\nentidade pública, desde que previsto no edital e o registro tenha sido feito em obediência ao disposto nesta Lei.\n§ 4\no\n As empresas estrangeiras que não funcionem no País, tanto quanto possível, atenderão, nas licitações\ninternacionais, às exigências dos parágrafos anteriores mediante documentos equivalentes, autenticados pelos\nrespectivos consulados e traduzidos por tradutor juramentado, devendo ter representação legal no Brasil com poderes\nexpressos para receber citação e responder administrativa ou judicialmente.\n§ 5\no\n Não se exigirá, para a habilitação de que trata este artigo, prévio recolhimento de taxas ou emolumentos,\nsalvo os referentes a fornecimento do edital, quando solicitado, com os seus elementos constitutivos, limitados ao\nvalor do custo efetivo de reprodução gráfica da documentação fornecida.\n§ 6\no\n O disposto no § 4\no\n deste artigo, no § 1\no\n do art. 33 e no § 2\no\n do art. 55, não se aplica às licitações\ninternacionais para a aquisição de bens e serviços cujo pagamento seja feito com o produto de financiamento\nconcedido por organismo financeiro internacional de que o Brasil faça parte, ou por agência estrangeira de\ncooperação, nem nos casos de contratação com empresa estrangeira, para a compra de equipamentos fabricados e\n[END CONTEXT 2]\n\n[CONTEXT 3]:\n\nsourceDocument: Lei Nº 8.666, de 21 de junho de 1993.pdf\npublished: 9/25/2024, 1:39:33 PM\n\n\nquando estas forem aplicadas sobre produtos ou serviços estrangeiros. (Incluído pela Lei Complementar nº 147,\nde 2014)\nArt. 4\no\n Todos quantos participem de licitação promovida pelos órgãos ou entidades a que se refere o art. 1º têm\ndireito público subjetivo à fiel observância do pertinente procedimento estabelecido nesta lei, podendo qualquer\ncidadão acompanhar o seu desenvolvimento, desde que não interfira de modo a perturbar ou impedir a realização dos\ntrabalhos.\nParágrafo único. O procedimento licitatório previsto nesta lei caracteriza ato administrativo formal, seja ele\npraticado em qualquer esfera da Administração Pública.\nArt. 5\no\n Todos os valores, preços e custos utilizados nas licitações terão como expressão monetária a moeda\ncorrente nacional, ressalvado o disposto no art. 42 desta Lei, devendo cada unidade da Administração, no pagamento\ndas obrigações relativas ao fornecimento de bens, locações, realização de obras e prestação de serviços, obedecer,\npara cada fonte diferenciada de recursos, a estrita ordem cronológica das datas de suas exigibilidades, salvo quando\npresentes relevantes razões de interesse público e mediante prévia justificativa da autoridade competente,\ndevidamente publicada.\n§ 1\no\n Os créditos a que se refere este artigo terão seus valores corrigidos por critérios previstos no ato\nconvocatório e que lhes preservem o valor.\n[END CONTEXT 3]\n\n[CONTEXT 4]:\n\nsourceDocument: Lei Nº 8.666, de 21 de junho de 1993.pdf\npublished: 9/25/2024, 1:39:33 PM\n\n\nparágrafo anterior será efetuado em moeda brasileira à taxa de câmbio vigente na data do efetivo pagamento.\n§ 2\no\n O pagamento feito ao licitante brasileiro eventualmente contratado em virtude da licitação de que trata o\nparágrafo anterior será efetuado em moeda brasileira, à taxa de câmbio vigente no dia útil imediatamente anterior à\ndata do efetivo pagamento. (Redação dada pela Lei nº 8.883, de 1994)\n§ 3\no\n As garantias de pagamento ao licitante brasileiro serão equivalentes àquelas oferecidas ao licitante\nestrangeiro.\n§ 4\no\n Para fins de julgamento da licitação, as propostas apresentadas por licitantes estrangeiros serão acrescidas\ndos gravames conseqüentes dos mesmos tributos que oneram exclusivamente os licitantes brasileiros quanto à\noperação final de venda.\n§ 5º Para a realização de obras, prestação de serviços ou aquisição de bens com recursos provenientes de\nfinanciamento ou doação oriundos de agência oficial de cooperação estrangeira ou organismo financeiro multilateral de\nque o Brasil seja parte, poderão ser admitidas na respectiva licitação, mantidos os princípios basilares desta lei, as\nnormas e procedimentos daquelas entidades e as condições decorrentes de acordos, protocolos, convenções ou\ntratados internacionais aprovados pelo Congresso Nacional.\n§ 5\no\n Para a realização de obras, prestação de serviços ou aquisição de bens com recursos provenientes de\n[END CONTEXT 4]\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nMe diga qual a relação das leis 8666 e 14133? Detalhe com exatidão os pontos relacionados e as melhorias entre elas. Quero um conteúdo bem rico em informações objetivas e detalhadas. Quero que a análise seja tanto qualitativa, como quantitativa. Porém, pautadas apenas na documentação informada.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="131633349513408" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="131633349513408" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[132011]: INFO [update_slots] input truncated | n_ctx=2048 n_erase=1634 n_keep=4 n_left=2044 n_shift=1022 tid="131633349513408" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=1026 slot_id=0 task_id=3 tid="131633349513408" timestamp=1727288472 Sep 25 15:21:12 biogpt01 ollama[132011]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="131633349513408" timestamp=1727288472 Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.463-03:00 level=DEBUG source=sched.go:341 msg="timer expired, expiring to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.463-03:00 level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.463-03:00 level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.463-03:00 level=DEBUG source=server.go:1044 msg="stopping llama server" Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.464-03:00 level=DEBUG source=server.go:1050 msg="waiting for llama server to exit" Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.510-03:00 level=DEBUG source=server.go:1054 msg="llama server stopped" Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.510-03:00 level=DEBUG source=sched.go:380 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.510-03:00 level=DEBUG source=sched.go:384 msg="sending an unloaded event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-970aa74c0a90ef7482477cf803618e776e173c007bf957f635f1015bfcfef0e6 Sep 25 15:25:50 biogpt01 ollama[82676]: time=2024-09-25T15:25:50.510-03:00 level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests" Sep 25 15:25:52 biogpt01 ollama[82676]: time=2024-09-25T15:25:52.082-03:00 level=DEBUG source=sched.go:466 msg="context for request finished" Sep 25 15:25:52 biogpt01 ollama[82676]: time=2024-09-25T15:25:52.082-03:00 level=DEBUG source=sched.go:339 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 duration=1h0m0s Sep 25 15:25:52 biogpt01 ollama[82676]: time=2024-09-25T15:25:52.082-03:00 level=DEBUG source=sched.go:357 msg="after processing request finished event" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 refCount=0 Sep 25 15:25:52 biogpt01 ollama[82676]: [GIN] 2024/09/25 - 15:25:52 | 200 | 5m1s | 172.17.0.4 | POST "/api/chat" Sep 25 15:27:16 biogpt01 ollama[132011]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=37294 status=200 tid="131633229334208" timestamp=1727288836 Sep 25 15:27:17 biogpt01 ollama[132011]: DEBUG [update_slots] slot released | n_cache_tokens=1028 n_ctx=8192 n_past=1027 n_system_tokens=0 slot_id=0 task_id=3 tid="131633349513408" timestamp=1727288837 truncated=true

timothycarambat commented 1 month ago

What are the specs of the machine running ollama? It looks like the content being passed is also more than what the model allows, which will cause issues in performance mode because it will do maximum VRAM allocation for the model - which for 70B is a lot

lramos7 commented 1 month ago

What are the specs of the machine running ollama? It looks like the content being passed is also more than what the model allows, which will cause issues in performance mode because it will do maximum VRAM allocation for the model - which for 70B is a lot

Nutanix node Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz, with 8 Cores and 90Gb ram.

Ollama and AnythingLLM are on the same host. I installed Ollama using the standard installation and Anything in the docker container.

timothycarambat commented 1 month ago

You should swap to the 13B model. Trying to run the 70B model on a machine with no GPU is 100% the main issue here - even though your RAM is sufficient to load the model, this is amplified even more by "Performance mode" since this will take your long prompt and then try to fit all of this into RAM.

If you are going CPU inferencing you should, typically, use more modest models like 13B or 8B. Does this issue persist if you use a smaller model? It should function fine with smaller model sizes.