ggerganov / llama.cpp

LLM inference in C/C++
MIT License
68.5k stars 9.84k forks source link

Unable to convert bloom models #4768

Closed JerryKwan closed 10 months ago

JerryKwan commented 11 months ago

When trying to convert bloom model downloaded from Huggingface (https://huggingface.co/bigscience/bloomz-1b7) using the following command

python3.10 convert.py /root/bloomz-1b7/

it outputs the following messages

Loading model file /root/bloomz-1b7/model.safetensors
Traceback (most recent call last):
  File "/root/workspace/llama.cpp/convert.py", line 1295, in <module>
    main()
  File "/root/workspace/llama.cpp/convert.py", line 1234, in main
    params = Params.load(model_plus)
  File "/root/workspace/llama.cpp/convert.py", line 318, in load
    params = Params.loadHFTransformerJson(model_plus.model, hf_config_path)
  File "/root/workspace/llama.cpp/convert.py", line 237, in loadHFTransformerJson
    raise Exception("failed to guess 'n_ctx'. This model is unknown or unsupported.\n"
Exception: failed to guess 'n_ctx'. This model is unknown or unsupported.
Suggestion: provide 'config.json' of the model in the same directory containing model files.

And config.json is in the same directory containing the model file Any one knows what caused the problem and how to solve it?

JerryKwan commented 11 months ago

And when using bloomz.cpp(https://github.com/NouamaneTazi/bloomz.cpp) to convert the model using the following command succeed

python3.10 convert-hf-to-ggml.py /root/bloomz-1b7/ ./models
ggerganov commented 11 months ago

Looks related to https://github.com/ggerganov/llama.cpp/issues/4493

JerryKwan commented 11 months ago

@ggerganov Thanks for looking into this issue. Any ETA about when the problem will be solved? There are a large number of users are using bloom-like model. And anything I can help to solve the problem?

ggerganov commented 11 months ago

@teleprint-me mentioned that they will take a look, but not sure if they had the chance yet. I prefer to rely on the community's help for the Python issues as it is not my field of expertise, but I will take a look if it does not get resolved soon

JerryKwan commented 11 months ago

@ggerganov Seems like I can use the following command to convert bloomz-1b7 successfully (commit f3f62f0d835d559e80714bbeb05d03125574e3dd ),

python3.10 ./convert-hf-to-gguf.py /root/bloomz-1b7/ 

And using the following command to load the model successfully

 ./main -m /root/bloomz-1b7/ggml-model-f16.gguf -n 128

So, there must be something wrong with convert.py and should not cost too much time to solve. I will digger deeper later

player1537 commented 11 months ago

I can confirm that, several weeks ago, I was able to use the convert-hf-to-gguf.py script to convert Bloom-560M to GGUF format. If it helps, I have the converted files for Bloom-560M available on huggingface hub.

Galunid commented 10 months ago

Was bloom ever supported by convert.py in the first place? I believe that one was meant for llama models (and some derivatives). Bloom used to have a separate script convert-bloom-hf-to-gguf.py (or something similar) that was then refactored into convert-hf-to-gguf.py.

Exception: failed to guess 'n_ctx'. This model is unknown or unsupported. suggests it wasn't.

JerryKwan commented 10 months ago

@player1537 Thanks for lending help. I can convert the model successfully, thank you

@Galunid I am not sure if bloom was supported by convert.py in the first place, but I think it would be better to use convert.py as the main convert tool, and it can use the functions defined in other modules

Galunid commented 10 months ago

That's not possible for now, perhaps in the future the scripts will be unified.

teleprint-me commented 10 months ago

Sorry I'm late to party. I've been sick. Just starting to feel functional compared to the last few days.

@Galunid is right for the most part. My 2 cents for clarity is that convert.py only handles the llama and gpt architectures. The convert-hf-to-gguf.py superseded the separate scripts that previously existed.

The main difference between them is convert.py uses a custom shim to save on memory usage and loads the tensors as they're needed. convert-hf-to-gguf.py consumes more memory as a result.

There are other contributors that are more knowledgeable about it's functionality than I. I've been slowly picking it apart though.

My advice would be to use the intended script which is the now homogenized convert-hf-to-gguf.py script. It's not so simple to merge the scripts which I discovered as I began to dig deeper into the torch code base.