请问训练的时候loss上升是正常现象吗

DRSY / EMO

[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)

114 stars 14 forks source link

Closed lichen914 closed 1 year ago

lichen914 commented 1 year ago

DRSY commented 1 year ago

loss正常来说是下降的。方便说明一下具体的模型、训练场景和训练设置吗？

DRSY commented 1 year ago

llama2-7b在alpaca-gpt4上用FSDP(full_shard, auto_wrap), fp16混合精度微调的初始loss变化如上图。

gpt-2在wikitext-2上的fp32微调的loss变话大概是从3.9开始不断下降。

lichen914 commented 1 year ago

用的codellama模型，a100上，使用deepspeed，bf16

lichen914 commented 1 year ago

我训练的基本上一直在上升，一直到5左右

DRSY commented 1 year ago

使用deepspeed似乎确实会有这个问题，目前原因还不清楚。建议目前先使用FSDP进行训练, fp16和bf16我这边均可观察到正常的loss下降。上图是使用FSDP, fp16，在SFT数据集上微调codellama-13b的初始loss变化。

lichen914 commented 1 year ago

DRSY commented 1 year ago

问题应该是在emo_loss和mle_loss加权的那一行代码。fix之后我用deepspeed(zero2, bf16)也能观察到正常的Loss下降了。