The current instruction tuning implementation masks the losses of all "user" tokens to zero. This PR adds a --scalar_loss_mask argument that makes it possible to just scale the loss of those tokens with a different value, e.g. 0.1 to make it possible for LLMs to produce "user" prompts too.
The current instruction tuning implementation masks the losses of all "user" tokens to zero. This PR adds a
--scalar_loss_mask
argument that makes it possible to just scale the loss of those tokens with a different value, e.g. 0.1 to make it possible for LLMs to produce "user" prompts too.