Closed yangtcai closed 2 years ago
I train and test our new filter datasets for 140 epochs, and both train and test loss don't coverage, after debugging the code using the whole day, I realize there is a bug in my previous code. 😨
Ohh, sorry to hear that, but bugs like that happen, nobody is safe from them :P
Would it make sense to use a smaller transformer with fewer layers during development to track training more quickly?
I train and test our new filter datasets for 140 epochs, and both train and test loss don't coverage, after debugging the code using the whole day, I realize there is a bug in my previous code. 😨