This pull request includes a new example implementation for running GPT-2 based model distilgpt2 transformer on the wikitext-2-raw-v1 dataset using AIHWKit. The example demonstrates how to convert the model to analog, run training and inference, and visualize the performance metrics using TensorBoard.
Details
Key Changes and Additions
Model and Dataset:
Implemented an example using the smallest GPT-2 model (distilgpt2).
Utilized the wikitext-2-raw-v1 dataset for training and validation, which is smaller and faster to process compared to openwebtext.
Training and Inference Setup:
Configured the model to use analog inference with specified noise levels.
Added support for digital inference as an option.
Implemented preprocessing functions to handle dataset tokenization.
Provided functionality to train the model and save/load checkpoints.
Logging and Monitoring:
Integrated TensorBoard for logging training and validation metrics.
Added TensorBoardCallback to the Trainer for seamless logging.
Configured the script to save logs in a specific directory and visualize them using TensorBoard.
Performance Metrics:
Calculated validation loss and perplexity as the primary performance metrics.
The example loads a pre-trained GPT-2 model trained on the wikitext dataset. It then applies convert_to_analog() to examine the effects of drift_analog_weights() on inference performance at different weight noise levels. Tensorboard is used to display the perplexity metrics evaluated using the model at various times after training completed.
Commandline arguments can be used to control certain options. For example: python /path/to/aihwkit/examples/31_gpt2_on_wikitext.py -n 0.1 -r "run 1" -l 0.0005 -t to set the weight noise to 0.1, name the run in Tensorboard "run 1", set the learning rate to 0.0005, and do hardware-aware training.
Related issues
New example added for the GPT-2 demonstration
Description
This pull request includes a new example implementation for running GPT-2 based model
distilgpt2
transformer on thewikitext-2-raw-v1
dataset using AIHWKit. The example demonstrates how to convert the model to analog, run training and inference, and visualize the performance metrics using TensorBoard.Details
Key Changes and Additions
Model and Dataset:
Training and Inference Setup:
Logging and Monitoring:
Performance Metrics:
README
Example 31: ['31_gpt2_on_wikitext.py'] This example is adapted from https://github.com/huggingface/notebooks/blob/main/examples/language_modeling.ipynb
The example loads a pre-trained GPT-2 model trained on the wikitext dataset. It then applies
convert_to_analog()
to examine the effects ofdrift_analog_weights()
on inference performance at different weight noise levels. Tensorboard is used to display the perplexity metrics evaluated using the model at various times after training completed.Commandline arguments can be used to control certain options. For example:
python /path/to/aihwkit/examples/31_gpt2_on_wikitext.py -n 0.1 -r "run 1" -l 0.0005 -t
to set the weight noise to 0.1, name the run in Tensorboard "run 1", set the learning rate to 0.0005, and do hardware-aware training.