Closed Saf9933 closed 3 weeks ago
can we have a design doc for this pr, meanwhile, why we need to change lots of configuration in the training model
can we have a design doc for this pr, meanwhile, why we need to change lots of configuration in the training model
I've added a design to the PR description to outline the changes. Regarding the configuration changes, they were necessary to enable hyperparameter tuning using Optuna. This tuning process allows key training parameters such as learning rate, rank, alpha, and dropout to be optimized dynamically rather than being fixed. The goal was to improve the model's performance by letting the tuning process find the optimal configuration for each training trial automatically, thus adding flexibility to the training pipeline.
can you just only add a new file : mlora_train_optuna.py
Hyperparameter Tuning for LoRA Training Model
Overview
This document describes the implementation of the
mlora_train_optuna.py
script for automated hyperparameter tuning using Optuna, which was previously missing. The goal is to optimize key hyperparameters like rank, learning rate, alpha, and dropout for the LoRA model to improve performance.Key Changes
Implementation of
mlora_train_optuna.py
:Training Configuration Updates:
Training Pipeline Adaptation:
Loss Tracking Implementation:
Flake8 Compliance:
Task Configuration Changes:
type
field inTrainTaskConfig
for compatibility with existing assertions.Optuna Integration
objective()
function defines the hyperparameter space for:Potential Issues
Conclusion
Integrating Optuna for hyperparameter tuning aims to automate the optimization process, improve training efficiency, and enhance model performance, reducing manual tuning efforts.