google-research / long-range-arena

Long Range Arena for Benchmarking Efficient Transformers
Apache License 2.0
710 stars 77 forks source link

Cannot reproduce the results for cifar10 #33

Closed La-SilverLand closed 3 years ago

La-SilverLand commented 3 years ago

the same question is issued in https://github.com/google-research/long-range-arena/issues/16

but for the vallina transformer, using the hyperparameters in the reply i cannot reproduce the results in the LRA paper ( 2 V100/32GB card) nor does the default settings in the code do can you kindly share a ready version of hyperparameters

my hyperparameters from the issue 16: batch_size: 256 checkpoint_freq: 17500 eval_frequency: 175 factors: constant linear_warmup cosine_decay grad_clip_norm: null learning_rate: 0.0005 model: attention_dropout_rate: 0.2 classifier_pool: CLS dropout_rate: 0.3 emb_dim: 128 learn_pos_emb: true mlp_dim: 128 num_heads: 1 num_layers: 1 qkv_dim: 64 model_type: transformer num_eval_steps: 39 num_train_steps: 35000 random_seed: 0 restore_checkpoints: true save_checkpoints: true steps_per_cycle: 35000 trial: 0 warmup: 175 weight_decay: 0.0

La-SilverLand commented 3 years ago

num_heads: 1 is wrongly set, after changing it to 8, the result can be reproduced