Hello, you have provided a great job. But I myself keep having gradient explosion problems when inserting it into llama_factory training, do you know why?
Among them I have modified the code as follows:
These places may need to be modified, otherwise they cannot be trained because the length of attention_mask does not match.
And I added this to the main training code
Have you been successful in your training? What configuration does the training use?
Hello, you have provided a great job. But I myself keep having gradient explosion problems when inserting it into llama_factory training, do you know why? Among them I have modified the code as follows:
These places may need to be modified, otherwise they cannot be trained because the length of attention_mask does not match.
And I added this to the main training code Have you been successful in your training? What configuration does the training use?
The terminal print result is