Cannot reproduce the results for cifar10

the same question is issued in https://github.com/google-research/long-range-arena/issues/16

but for the vallina transformer, using the hyperparameters in the reply i cannot reproduce the results in the LRA paper ( 2 V100/32GB card) nor does the default settings in the code do can you kindly share a ready version of hyperparameters

my result, test in step: 34999, loss: 1.8261, acc: 0.3633
paper result: 42.44 (same with the latest V2 release result)

my hyperparameters from the issue 16: batch_size: 256 checkpoint_freq: 17500 eval_frequency: 175 factors: constant linear_warmup cosine_decay grad_clip_norm: null learning_rate: 0.0005 model: attention_dropout_rate: 0.2 classifier_pool: CLS dropout_rate: 0.3 emb_dim: 128 learn_pos_emb: true mlp_dim: 128 num_heads: 1 num_layers: 1 qkv_dim: 64 model_type: transformer num_eval_steps: 39 num_train_steps: 35000 random_seed: 0 restore_checkpoints: true save_checkpoints: true steps_per_cycle: 35000 trial: 0 warmup: 175 weight_decay: 0.0

google-research / long-range-arena

Cannot reproduce the results for cifar10 #33