结果不是很好 - Githubissues

Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca

https://github.com/Facico/Chinese-Vicuna

Apache License 2.0

4.14k stars 421 forks source link

结果不是很好 #198

Open lucasjinreal opened 1 year ago

lucasjinreal commented 1 year ago

用脚本和数据训练了3个epoch，发现问题比较大：

首先是这个每次生成默认的参数下，一直在revise，不断的beamsearch，体验太差，或者是效果不好。

其次是升成的回答效果较差。

lucasjinreal commented 1 year ago

niuhuluzhihao commented 1 year ago

为什么我生成的对话ui和你的是不一样呢，请问是如何进行配置的呢？麻烦指导一下

lucasjinreal commented 1 year ago

我用的是默认的，你这个不是这个仓库的问题就不要再这里发了吧

LZY-the-boys commented 1 year ago

@lucasjinreal 你使用的是finetune_chat训练的吗？数据也是我们的instruction_chat_50k.jsonl?

lucasjinreal commented 1 year ago

@LZY-the-boys 是的，不过我用的是fix eos token的llama那个预训练版本，训练了1200 steps，2个epoch，因为最后一个epoch没有保存成功。

数据也是的

lucasjinreal commented 1 year ago

@LZY-the-boys 不过可能存在某种情况，请教一下，我貌似重新启动训练，有一定的概率，保存路径下的adapter_mode.bin 是空的dict，只有启动训练后保存了一次checkpioint之后，会有大小。比较奇怪，

假如保存这个文件，但是dict是空，是不是压根就没有加载lora权重？还有PeftModel.from_pretrained的时候，他到底是getlatest 查找最近的checkpoint load里面的pytorch_model.bin ，还是load外面的adaptermodel.bin？

二者是否是一致的？

lucasjinreal commented 1 year ago


model.print_trainable_parameters()
print(f"peft config of model: {model.peft_config}")
logger.info(f"model.modules_to_save: {model.modules_to_save}")
old_state_dict = model.state_dict
# model.state_dict = (
#     lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
# ).__get__(model, type(model))

if torch.__version__ >= "2" and sys.platform != "win32":
    # model = torch.compile(model)
    pass

model.save_pretrained(args.output_path)

print(f"now FUCK model s: {list(model.state_dict().keys())[:70]}")
# print(f"{list(torch.load(os.path.join(args.resume_from_checkpoint, 'pytorch_model.bin')).keys())[:70]}")

trainer = transformers.Trainer(

还有，我如果注释掉 get_peft_model 那一行，对训练的checkpoint有什么影响吗？因为我看你的代码是加在整个训练完成最后，那也就是意味着，model里面的statedict压根就不是peft的statedict了。。

Facico commented 1 year ago

训练过程中保存的是pytorch_model.bin，最后保存的时候adapter_model.bin（peft要直接加载的是adapter_model.bin，所以推理代码中会判断并改名字）