Closed TXH-mercury closed 1 year ago
Hello, is the problem you said about NaN solved? I also meet the problem, if I modify the data preprocessing can solve this problem, but the accuracy is not as accurate as in the paper
I encountered the same problem as you. How did you resolve it?
--do_eval can get correct zero-shot performance but --do_train meets NaN at the start of training, in both 1 card and 4 cards settings. The default parameters are used.