Open oujieww opened 1 year ago
i retrain, set per_device_train_batch_size=32 and gradient_accumulation_steps=8, as i only use one GPU
eval: {'accuracy': 0.93204}, skim_loss: 0.5163723826408386, tokens_remained: 0.9725673198699951, layers: 0 1.067900538444519 1 1.0005346536636353 2 0.9959414005279541 3 0.9953616857528687 4 0.9922887086868286 5 0.9482402801513672 6 0.9456591010093689 7 0.9453616738319397 8 0.9452711343765259 9 0.9450154304504395 10 0.9446678161621094 11 0.9445654153823853
is the tokens_remained right?
i run code for imdb got acc 91.6%, but 93.7 in paper, is there any detail i missing?