eole-nlp / eole

Open language modeling toolkit based on PyTorch
https://eole-nlp.github.io/eole
MIT License
53 stars 11 forks source link

`eole model lora` fails to save the config #11

Closed l-k-11235 closed 3 months ago

l-k-11235 commented 4 months ago

I got that error when I ran the command:

Traceback (most recent call last):
  File "/usr/local/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
  File "workdir/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "workdir/eole/eole/bin/model/lora_weights.py", line 105, in run
    json.dump(new_config, f, indent=2, ensure_ascii=False)
  File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type TrainConfig is not JSON serializable

eole model lora --action merge --base_model ${EOLE_MODEL_DIR}/llama3-8b --lora_weights ./finetune/llama3-8b-finetune/step_10500 --output ./finetune/merged

(I just commented the related lines at the moment)

francoishernandez commented 4 months ago

Good catch. Forgot to fix this one post checkpoint format restructuring. Can you PR a patch similar to this? https://github.com/eole-nlp/eole/blob/60fbbe4ee06d1be93173e8f4c6f06b97266ec46b/eole/bin/convert/convert_HF.py#L885-L889

NB we shall add some tests for such features/tools at some point.

l-k-11235 commented 4 months ago

Sorry it does not seem to work, I had the same error with the patch.

francoishernandez commented 4 months ago

Provide trace and diff please. Probably not the exact same error, as the object is supposed to have changed type. The recursive_model_fields_setfunction takes a pydantic config object and returns a nested dict with all the explicitly set fields from said config. So it's supposed to be json serializable. What you might have is a similar error, but with another object.

francoishernandez commented 3 months ago

Just tested the suggested patch (#28) and it seems to work fine. Please re-open with more details if other issues arise.