EleutherAI / math-lm

MIT License
1.02k stars 75 forks source link

cannot convert raw llama weights to NeoX #96

Open scikkk opened 9 months ago

scikkk commented 9 months ago

Hello! Thanks for your great work, but I met some problems when trying to replicate the results.

Specifically, I cannot find convert_raw_llama_weights_to_hf.py as depicted in README.md .

However, I found convert_raw_llama_weights_to_neox.py, which seems can convert Meta->NeoX format.

But the python script doesn't support --config_file, so I use --model_size=7B instead. Unfortunately, I met an error:

(gptneox2) root@ebf9662e:/math-lm# bash convert.sh 
sequential
dict_keys(['dim', 'n_layers', 'n_heads', 'multiple_of', 'ffn_dim_multiplier', 'norm_eps', 'rope_theta'])
Traceback (most recent call last):
  File "gpt-neox/tools/convert_raw_codellama_weights_to_neox.py", line 650, in <module>
    main()
  File "gpt-neox/tools/convert_raw_codellama_weights_to_neox.py", line 641, in main
    convert_model_sequential(
  File "gpt-neox/tools/convert_raw_codellama_weights_to_neox.py", line 308, in convert_model_sequential
    num_kv_heads = params["n_kv_heads"]
KeyError: 'n_kv_heads'

Here is my convert.sh:

python convert_raw_codellama_weights_to_neox.py \
 --input_dir /math-lm/codellama \
 --model_size 7B \
 --output_dir /math-lm/codellama/7B-NeoX \
 --num_output_shards 2

Here is my raw codellama7b:

(gptneox2) root@ebf9662e:/math-lm/codellama/7B# tree .
.
├── checklist.chk
├── consolidated.00.pth
├── params.json
└── tokenizer.model

Looking forward to your reply, any help will be appreciated!

scikkk commented 9 months ago

I made 2 changes and the conversion secceeded:

# LINE 308: num_kv_heads = params["n_kv_heads"]
num_kv_heads = params["n_kv_heads"] if "n_kv_heads" in params else params["n_heads"]
# LINE 383: if model_size == "7B":
if model_size == "7B" and "layers.0.attention.inner_attention.rope.freqs" in loaded[0]:
andrewarrow commented 8 months ago

I'm stuck at the codellama/7B/params.json file not being json. It's an HTML file?

    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

When I view the html file in a browser, it's a "Sign in to continue to Gmail" login page.