JonasGeiping / cramming

Cramming the training of a (BERT-type) language model into limited compute.
MIT License
1.29k stars 100 forks source link

TypeError: _load_optimizer() missing 1 required positional argument: 'initial_time' #38

Closed vincent-163 closed 9 months ago

vincent-163 commented 9 months ago

While evaluating UltraFastBERT (a downstream project using the repository at https://github.com/pbelcak/UltraFastBERT under the training folder, with most of the code identical), I encountered the following error when running python eval.py eval=GLUE name=UltraFastBERT-1x11-long eval.checkpoint=hf://pbelcak/UltraFastBERT-1x11-long impl.microbatch_size=4d:

 loaded with 164,460,531 parameters.
Some weights of ScriptableLMForSequenceClassification were not initialized from the model checkpoint at pbelcak/UltraFastBERT-1x11-long and are newly initialized: ['pooler.dense.weight', 'head.weight', 'head.bias', 'pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Error executing job with overrides: ['eval=GLUE', 'name=UltraFastBERT-1x11-long', 'eval.checkpoint=hf://pbelcak/UltraFastBERT-1x11-long', 'impl.microbatch_size=4']
Traceback (most recent call last):
  File "/root/autodl-tmp/UltraFastBERT/training/eval.py", line 147, in launch
    cramming.utils.main_launcher(cfg, main_downstream_process, job_name="downstream finetuning")
  File "/root/autodl-tmp/UltraFastBERT/training/cramming/utils.py", line 54, in main_launcher
    metrics = main_fn(cfg, setup)
  File "/root/autodl-tmp/UltraFastBERT/training/eval.py", line 37, in main_downstream_process
    model_engine.load_checkpoint(cfg_arch, model_file)
  File "/root/autodl-tmp/UltraFastBERT/training/cramming/backend/torch_default.py", line 237, in load_checkpoint
    self.optimizer, self.scheduler = _load_optimizer(self.model, self.cfg_train, self.cfg_impl)
TypeError: _load_optimizer() missing 1 required positional argument: 'initial_time'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

And indeed, line 237 of the file calls _load_optimizer with just 3 arguments instead of 4: https://github.com/JonasGeiping/cramming/blob/f6ba4cb76ff7847ecc64067b3e7eaa1eed9625a5/cramming/backend/torch_default.py#L237

Maybe add self.initial_time as the fourth argument?

JonasGeiping commented 9 months ago

Yeah it should be, feel free to make a PR