Closed gokceuludogan closed 6 months ago
The implementation was tested by fine-tuning mBART on paraphrasing datasets. The loss decreased to a value comparable to that of T5 models, showing a reasonable curve.
Issue #34 has been opened for the implementation of custom optimizer, scheduler and parameters.
This pull request introduces changes to the optimizer and linear scheduler configurations in the
trainer.py
andfinetune.py
The current implementation supports Adafactor optimizer, with optional scheduler and the default transformers trainer optimizer, AdamW with a learning rate of 5e-5 and a linear scheduler.
Usage instructions:
To employ Adafactor with a scheduler, configure as follows:
To use Adafactor without a scheduler, configure as follows:
For default parameters (utilizing AdamW and a linear scheduler), any setting other than
adafactor
is required: