LINs-lab / DynMoE

[Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
https://arxiv.org/abs/2405.14297
Apache License 2.0
50 stars 9 forks source link

EMoE Language Evaluation #3

Open caichaoxiang opened 3 days ago

caichaoxiang commented 3 days ago

Hello, during the Language training and testing process of EMoE, when I test after training, the following is displayed:


['cola'] Namespace(adaptive_experts=False, add_expert_size=0, aux_loss_weight=0.01, cache_dir='./.cache', capacity_factor=1.5, checkpointing_steps=None, disable_peft=False, expert_repeat=1, gate_noise=1.0, gate_type='top', gradient_accumulation_steps=1, hub_model_id=None, hub_token=None, ignore_mismatched_sizes=False, include_training=False, is_gshard_loss=False, key_gate=False, learning_rates=[2e-05, 3e-05, 5e-05], load_model=None, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, max_expert_num=8, max_length=128, max_train_steps=None, model_name_or_path='/MyData/bert-large-cased', moe_drop=0.1, moe_layers=[10, 11], normalize_one_score_gate=False, num_experts=16, num_train_epochs=10, num_warmup_steps=0, one_score=False, one_score_gate_update_momentum=0.0, output_dir='test', pad_to_max_length=False, per_device_eval_batch_size=32, per_device_train_batch_size=64, push_tohub=False, random cluster=False, random_init_gate=False, report_to='tensorboard', resume_from_checkpoint=None, save_model=False, seeds=[0, 1, 2], source_dir='/MyData/bert-large-cased_save/cola', task_name='cola', to_MoE=False, top_k=4, train_file=None, use_fp1 6=True, use_slow_tokenizer=False, validation_file=None, weight_decay=0.0, with_tracking=True) learn_gate_random_False_repeat16 test No best results found

What is the problem?

As far as I can remember I only changed the following in search_glue_no_trainer.py line 544:

 data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=(8 if accelerator.use_fp16 else None))
 ------------------------------------------------

there is an error (*** AttributeError: 'Accelerator' object has no attribute 'use_fp16'), so I changed it to:

  try:
      pad_to_multiple_of = (8 if accelerator.use_fp16 else None)
  except:
      pad_to_multiple_of = (None)
  data_collator = DataCollatorWithPadding(tokenizer, pad_to_multiple_of=pad_to_multiple_of)
  ------------------------------------------------
QAQdev commented 2 days ago

During training, EMoE does a grid search on seeds (0,1,2) and lr (2e-5, 3e-5, 5e-5), each combination will produce a result. And when grid search ends, a txt file will be saved at the output dir. You may see something like this:

image

The filename of this txt file contains the best learning rate found during training. In test_glue_no_trainer.py, you should see the follow lines of code, which extracts the best lr from the filename of the txt file. So to find the bug, I think you need to check whether the txt file is saved successfully during training.

image