Closed hangzeli08 closed 1 year ago
Since your training set contains the sample that would not have valid ground truth. For example, the maximum length of the model is set to 256, and your instruction is longer than 256, the response would be truncated. In this case, your training sample would not produce valid loss since the instruction would not compute the loss. One way to fix this is to increase the maximum length or try to shorten your instruction.
和长度没关系,长度够的。请问还有可能是什么原因
I mean some samples would have nan. The printed loss is just accumulated number statistics. It would not affect the model training, the model would skip it automatically.
有遇到这种情况的吗,怎么解决的啊