GPT-NeoX has only minimal inference support

ggerganov / llama.cpp

LLM inference in C/C++

MIT License

63.82k stars 9.15k forks source link

GPT-NeoX has only minimal inference support #3293

Closed cebtenzzre closed 3 months ago

cebtenzzre commented 11 months ago

Steps to reproduce:

Download https://huggingface.co/EleutherAI/gpt-neox-20b

Convert the model and attempt to use it:


$ TMPDIR=/var/tmp ./convert-gptneox-hf-to-gguf.py gpt-neox-20b 1 --outfile gpt-neox-20b.f16.gguf
$ ./main -m gpt-neox-20b.f16.gguf
<snip>
llama_model_loader: - type  f32:  354 tensors
llama_model_loader: - type  f16:  178 tensors
error loading model: cannot find tokenizer scores in model file

llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model 'gpt-neox-20b.f16.gguf' main: error: unable to load model

cebtenzzre commented 11 months ago

Even if you add dummy scores and token types in the conversion script, it fails here: https://github.com/ggerganov/llama.cpp/blob/bc9d3e3971e5607a10ff4c24e39568ce1ac87271/llama.cpp#L2288 Was GPT-NeoX ever even implemented in GGUF?

Jacoby1218 commented 11 months ago

Was GPT-NeoX ever even implemented in GGUF?

Yes, example inference code exists here: https://github.com/ggerganov/llama.cpp/blob/master/examples/gptneox-wip/gptneox-main.cpp

cebtenzzre commented 11 months ago

Oh, it has a separate implementation. So I can't currently use it with any third-party software that uses the llama.cpp API.

edit: This file is not listed in either of the build scripts. It doesn't seem to have GPU acceleration. It seems like that could be improved.

ggerganov commented 10 months ago

Yeah, there's just a poc implementation. We should add it in llama.cpp eventually

maddes8cht commented 9 months ago

The situation is now that we do have code in this repository to "successfully" convert and quantize a gpt-neo-x model, but no way to run these models. https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox does have its own convert-script. The conversion here in the convert-hf-to-gguf.py does not seem to have any purpose at all.

maddes8cht commented 9 months ago

I still would like to bring this forward again: There is code inside the convert-hf-to-gguf.py since its first release in #3838 and before in the seperate convert-gptneox-script to somehow "sucessfully" convert gpt-neo-x models into gguf models. But there is no code whatsoever to run inference on a model that the convert script labels as a gptneox model.

Galunid commented 9 months ago

I believe you can run this using https://github.com/ggerganov/ggml/blob/master/examples/gpt-neox/main.cpp it's just not in llama.cpp. It'll be supported eventually.

maddes8cht commented 9 months ago

Oh, I alredy compiled that example for testing. It seems to expect the old ggml .bin files, which can be created using the convert program in the same example directory. It doesn't run the gguf files that are built using the convet-hf-to-gguf.pyi script. Right now, there is no code that can run these gguf files converted from gpt-neo-x models.

ggerganov commented 9 months ago

It's much easier to add new arches to llama.cpp now (I hope) - PRs welcome

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

cebtenzzre commented 4 months ago

I'm still interested in this.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.