baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology
https://huggingface.co/baichuan-inc
Apache License 2.0
4.03k stars 289 forks source link

Max-z loss #325

Closed bpwl0121 closed 6 months ago

bpwl0121 commented 6 months ago

Hi,

where do you use Max-z loss actually? there is no Max-z loss inside the base model, but the chat version. So you use the max z loss at the sft?

best,

GradientGuru commented 6 months ago

Good day. We do not use z loss during fine-tuning. Default self.config.z_loss_weight is 0. However thanks for pointing out that we've missed the z_loss code in 7B-Base repo, fixed now.

bpwl0121 commented 6 months ago

I think you have to update your config file accordingly. there is no z loss weight