Open hubblebubblepig opened 8 months ago
完整的报错信息贴一下
(D:\LLM\venv311) D:\LLM\ollama\llm\llama.cpp>ollama list NAME ID SIZE MODIFIED atom7bchat:latest d44f818c3956 4.0 GB 6 days ago
(D:\LLM\venv311) D:\LLM\ollama\llm\llama.cpp>ollama run atom7bchat Error: error loading model C:\Users\ADMIN.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e
=========================================================== ollama server.log
time=2024-03-11T08:53:54.688+08:00 level=INFO source=images.go:710 msg="total blobs: 3"
time=2024-03-11T08:53:54.699+08:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-03-11T08:53:54.701+08:00 level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
time=2024-03-11T08:53:54.702+08:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-11T08:53:54.869+08:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cuda_v11.3 cpu cpu_avx cpu_avx2]"
time=2024-03-11T08:56:15.351+08:00 level=INFO source=images.go:710 msg="total blobs: 3"
time=2024-03-11T08:56:15.360+08:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-03-11T08:56:15.361+08:00 level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
time=2024-03-11T08:56:15.361+08:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-11T08:56:15.445+08:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11.3]"
[GIN] 2024/03/11 - 08:56:39 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/03/11 - 08:56:39 | 200 | 1.3001ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2024/03/11 - 08:57:50 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2024/03/11 - 08:57:50 | 200 | 512.2µs | 127.0.0.1 | POST "/api/show"
[GIN] 2024/03/11 - 08:57:50 | 200 | 0s | 127.0.0.1 | POST "/api/show"
time=2024-03-11T08:57:51.006+08:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-03-11T08:57:51.006+08:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library nvml.dll"
time=2024-03-11T08:57:51.010+08:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [c:\Windows\System32\nvml.dll C:\Windows\system32\nvml.dll]"
time=2024-03-11T08:57:51.025+08:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
time=2024-03-11T08:57:51.026+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-11T08:57:51.050+08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6"
time=2024-03-11T08:57:51.050+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-11T08:57:51.050+08:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6"
time=2024-03-11T08:57:51.050+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-11T08:57:51.050+08:00 level=INFO source=dyn_ext_server.go:385 msg="Updating PATH to C:\Users\*"
/注: 为省略的path内容*/
time=2024-03-11T08:57:51.106+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: C:\Users\ADMIN\AppData\Local\Temp\ollama1716865875\cuda_v11.3\ext_server.dll"
time=2024-03-11T08:57:51.106+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA RTX A4000, compute capability 8.6, VMM: yes
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from C:\Users\ADMIN.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Atom-7B-Chat
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 9: general.file_type u32 = 2
llama_model_loader: - kv 10: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,65000] = ["", "", "<0x00>", "<...
llama_model_loader: - kv 12: tokenizer.ggml.token_type arr[i32,65000] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 15: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 16: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 17: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_load: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'C:\Users\ADMIN.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e'
{"timestamp":1710118672,"level":"ERROR","function":"load_model","line":388,"message":"unable to load model","model":"C:\Users\ADMIN\.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e"}
time=2024-03-11T08:57:52.167+08:00 level=WARN source=llm.go:162 msg="Failed to load dynamic library C:\Users\ADMIN\AppData\Local\Temp\ollama1716865875\cuda_v11.3\ext_server.dll error loading model C:\Users\ADMIN\.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e"
time=2024-03-11T08:57:52.167+08:00 level=INFO source=dyn_ext_server.go:385 msg="Updating PATH to C:\Users\*"
/注: 为省略的path内容*/
time=2024-03-11T08:57:52.168+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: C:\Users\ADMIN\AppData\Local\Temp\ollama1716865875\cpu_avx2\ext_server.dll"
time=2024-03-11T08:57:52.168+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from C:\Users\ADMIN.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Atom-7B-Chat
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 9: general.file_type u32 = 2
llama_model_loader: - kv 10: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,65000] = ["", "", "<0x00>", "<...
llama_model_loader: - kv 12: tokenizer.ggml.token_type arr[i32,65000] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 13: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 15: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 16: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 17: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_load: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file
llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model 'C:\Users\ADMIN.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e' {"timestamp":1710118672,"level":"ERROR","function":"load_model","line":388,"message":"unable to load model","model":"C:\Users\ADMIN\.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e"} time=2024-03-11T08:57:52.184+08:00 level=WARN source=llm.go:162 msg="Failed to load dynamic library C:\Users\ADMIN\AppData\Local\Temp\ollama1716865875\cpu_avx2\ext_server.dll error loading model C:\Users\ADMIN\.ollama\models\blobs\sha256-f1b0e1527c2e9d45672180d7a45c3fe3511d7a30ecae16db112f57cc3862691e" [GIN] 2024/03/11 - 08:57:52 | 500 | 1.3745928s | 127.0.0.1 | POST "/api/chat"
"llama_model_load: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file" "Failed to load dynamic library"