ggerganov / llama.cpp

LLM inference in C/C++
MIT License
63.82k stars 9.15k forks source link

Gpt neox model that was converted and quantized to gguf refused to run #2706

Closed JohnClaw closed 4 months ago

JohnClaw commented 12 months ago

I converted Astrid 1b CPU (https://huggingface.co/PAIXAI/Astrid-1B-CPU) to gguf and quantized it. Then i tried to run it using "main -m 1B/ggml-model-q4_1.gguf -n 128" and got this error:

error loading model: key not found in model: llama.context_length llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model '1B/ggml-model-q4_1.gguf' main: error: unable to load model

klosax commented 12 months ago

Gpt neox is not supported in main yet. The example here should work: https://github.com/ggerganov/llama.cpp/blob/master/examples/gptneox-wip/gptneox-main.cpp

JohnClaw commented 12 months ago

https://github.com/ggerganov/llama.cpp/blob/master/examples/gptneox-wip/gptneox-main.cpp

Thank you. Is there a win64 executable for this?

klosax commented 12 months ago

No, this is currently work in progress until it is implemented in main.

JohnClaw commented 12 months ago

https://github.com/ggerganov/llama.cpp/blob/master/examples/gptneox-wip/gptneox-main.cpp

I tried to compile it using Embarcadero Dev-C++ 6.3 TDM-GCC 9.2 and got this errors:

C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2b15): undefined reference to `ggml_get_tensor'
C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2c27): undefined reference to `gguf_init_from_file'
C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2c80): undefined reference to `gguf_get_version'
C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2cbb): undefined reference to `gguf_get_alignment'
C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2cf7): undefined reference to `gguf_get_data_offset'
C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2d3a): undefined reference to `gguf_find_key'
C:\Users\D683~1\AppData\Local\Temp\ccvrHHud.o:1.cpp:(.text+0x2d61): undefined reference to `gguf_get_val_str'
klosax commented 12 months ago

I think you need to include ggml.c in the compilation.

klosax commented 12 months ago

Adding a build target like the quantize example in Makefileshould also work.

Jacoby1218 commented 11 months ago

Now that Falcon is fixed for CUDA and because of the fact that supposedly they're on similar arches, can we get NeoX support added?

ggerganov commented 11 months ago

Sure, PRs welcome

JohnClaw commented 11 months ago

Hope that someday you will make llama.cpp support gpt-j, gpt2, gpt3, bloomz, t5, bert, rwkv, x-gen, btlm, mpt and starcoder.

ggerganov commented 11 months ago

probably just gpt4

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.