datawhalechina / self-llm

《开源大模型食用指南》基于Linux环境快速部署开源大模型,更适合中国宝宝的部署教程
Apache License 2.0
6.51k stars 798 forks source link

请问chatglm模型Lora微调完成之后,如何加载新模型? #72

Closed waynetest2024 closed 1 month ago

waynetest2024 commented 3 months ago

请问chatglm模型Lora微调完成之后,如何加载新模型? 虽然使用示例中“模型推理”小节的方式可以生成结果,但是我希望能够通过curl或者其他方式直接使用新模型进行推理。尝试了“重新加载”小节给出的实例,但是本地找不到checkpoint-1000文件,希望能够在lora微调.py文件后能补充描述,感谢!

Hongru0306 commented 3 months ago

可以先设置save_strategy=5,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。

waynetest2024 commented 3 months ago

好的,我再试一下。之前试过调用save_pretrained()一直报错,提示不是json格式。 curl指的是“ChatGLM3-6B FastApi 部署调用”这一章里面介绍的办法,不过这个应该不是大问题,主要是前一步没有解决。 感谢回复!

waynetest2024 commented 3 months ago

可以先设置save_strategy=5,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。 Traceback (most recent call last): File "train.py", line 79, in trainer.train() File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1944, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2302, in _maybe_log_save_evaluate self._save_checkpoint(model, trial, metrics=metrics) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2378, in _save_checkpoint self.save_model(staging_output_dir, _internal_call=True) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2886, in save_model self._save(output_dir) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2958, in _save self.model.save_pretrained( File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 201, in save_pretrained peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict) File "/root/miniconda3/lib/python3.8/site-packages/peft/utils/config.py", line 92, in save_pretrained writer.write(json.dumps(output_dict, indent=2, sort_keys=True)) File "/root/miniconda3/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/root/miniconda3/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/root/miniconda3/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type set is not JSON serializable 100%|██████████| 466/466 [07:32<00:00, 1.03it/s]

liyunhan commented 3 months ago

@waynetest2024 我想知道您微调多大的模型,多少数据大概用了多久? 我1W的训练数据,LoRA微调qwen1.5-32b-chat在A6000上慢的要死....batch我设置的16,一个batch就恨不得一分钟

waynetest2024 commented 3 months ago

@waynetest2024 我想知道您微调多大的模型,多少数据大概用了多久? 我1W的训练数据,LoRA微调qwen1.5-32b-chat在A6000上慢的要死....batch我设置的16,一个batch就恨不得一分钟

就是demo里的模型和数据,chatglm3-6b、huanhuan.json,4090上几分钟跑一趟吧。我只是熟悉下基本流程,要求比较低

Hongru0306 commented 3 months ago

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。

您好,测试的话不用设置为epoch的,直接设置为iter,然后每5个iter就保存一下,看看是否正常。如果实在解决不了,后续我会创一个没问题的环境推到autodl上,更新后附到repo的相关链接中。

Hongru0306 commented 3 months ago

可以先设置save_strategy=5,看一下输出的路径在哪里。curl是指加载你自己lora后的模型吗?这个需要你合并模型后将相应部分推到modelscope或者hf上才可以的。

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。 Traceback (most recent call last): File "train.py", line 79, in trainer.train() File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 1944, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2302, in _maybe_log_save_evaluate self._save_checkpoint(model, trial, metrics=metrics) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2378, in _save_checkpoint self.save_model(staging_output_dir, _internal_call=True) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2886, in save_model self._save(output_dir) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 2958, in _save self.model.save_pretrained( File "/root/miniconda3/lib/python3.8/site-packages/peft/peft_model.py", line 201, in save_pretrained peft_config.save_pretrained(output_dir, auto_mapping_dict=auto_mapping_dict) File "/root/miniconda3/lib/python3.8/site-packages/peft/utils/config.py", line 92, in save_pretrained writer.write(json.dumps(output_dict, indent=2, sort_keys=True)) File "/root/miniconda3/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/root/miniconda3/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/root/miniconda3/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/root/miniconda3/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type set is not JSON serializable 100%|██████████| 466/466 [07:32<00:00, 1.03it/s]

还有一个解决思路,使用提供的ipynb进行训练,然后在训练结束后手动保存model的权重,看下保存路径在哪里。

waynetest2024 commented 3 months ago

现在这个参数改为save_strategy='epoch'了,不过加上去之后依然报错,报错信息如下。感觉像是版本的问题,不过python包我都是按照项目示例给的版本安装的。

您好,测试的话不用设置为epoch的,直接设置为iter,然后每5个iter就保存一下,看看是否正常。如果实在解决不了,后续我会创一个没问题的环境推到autodl上,更新后附到repo的相关链接中。

哦哦,但是设置save_strategy=5会直接报错