Azure / AzureML-NLP

5 stars 5 forks source link

Accelerator - Fine-Tuning Language Models in AML with Custom Datasets

Repository Overview

This repository contains three Jupyter notebooks that demonstrate end-to-end an NLP pipeline using AzureML v1, AzureML v2, and AzureML v1 with AutoML and Transformers, respectively. All notebooks use HyperDrive for hyperparameter tuning, Managed Online Endpoint for model training and deployment, and Amazon product reviews in the Automotive category as the dataset.

The aim is to provide an example to fine-tune a custom NLP model using native AML capabilities including HyperDrive to tune the hyperparameters, AutoML job to compare results and Managed Endpoint to deploy the model. This version is developed using AML V1 although the deployment of the model to Managed Endpoints is developed using SDK V2.

This work leverages AML Pipelines V1, AML Pipelines V2, AutoML and Managed Endpoints. You can bring in several model types, but for this example, we used Bert Based Cased.

Table of Contents

Notebook 1: AzureML v1 and Transformers

This notebook demonstrates an end-to-end text classification (sentiment analysis) process using AzureML v1 and Transformers. The process includes:

Pipeline

Notebook 2: AzureML v2 and Transformers

This notebook represents the end-to-end process of training, tuning, registering, and deploying a sentiment analysis model using AzureML SDK v2 and Transformers. The steps executed in this pipeline are similar to those in Notebook 1 but make use of the newer AzureML SDK v2.

Todo: Pipeline image v2 Pipeline

Notebook 3: AzureML v1, AutoML, and Transformers

This notebook demonstrates an end-to-end machine learning process using AzureML v1, AutoML, and Transformers. The process includes:

This notebook provides a comprehensive example of using AzureML to automate the machine learning process, from data preparation to model deployment. It showcases the powerful capabilities of the platform, such as HyperDrive for hyperparameter tuning and AutoML for model selection, and enables seamless integration of various pipeline steps.

Pipeline

Requirements

Dataset

The dataset is the Amazon Review dataset from UCSD archive to predict sentiment of the reviews from ratings. For this sample, the Automotive dataset is chosen which has around 19K samples. DefinePipeline.ipynb notebook contains the steps to download and prepare the dataset.

AML Pipeline

This work is using AML Pipelines for easier operationalization of the workload. It consists of five steps which includes

Step 1 - HyperDrive_Step

This step is a HyperDrive step which tunes a HuggingFace Transformer Bert Base Cased. The parameters of the step can help increasing the trials and test different combinations of hyperparameters to get to the best model.

The most important parameters are learning-rate and epochs. During tests, we learned that learning-rate 5.5e-5 can on this dataset. The epoch value is recommended to be set to 3 or 4 based on the Bert paper.

Hyper Drive Trials

GPU Resource Utilization Tracking with Nvitop

In this step, we utilized the nvitop Python package to monitor and collect GPU resource utilization during the fine-tuning of the Transformers. Nvitop is a powerful tool designed for NVIDIA GPUs that provides real-time visualizations of GPU usage and statistics, similar to what htop provides for CPUs. By using nvitop, we were able to track GPU metrics such as GPU temperature, GPU utilization, memory utilization, and more. This allowed us to ensure that the training process was efficiently using the available resources, and to identify potential bottlenecks or issues in real-time. Understanding these metrics is key to optimizing the fine-tuning process and ensuring that the model training is both effective and efficient.

Here's an example of how to use the nvitop package to monitor your GPU utilization:

from nvitop import ResourceMetricCollector
....
....

if collect_resource_utilization == 1:
        collector = ResourceMetricCollector(interval=resource_utilization_interval)
        daemon = collector.daemonize(on_collect, interval=None, tag="")

def on_collect(metrics):
    .....
    if run:
        run.log_row(f"{gpu_name} utilization", **perc_vals)
        run.log_row(f"{gpu_name} memory", **mem_vals)
    .....

Step 2 - AutoML Step (Only in Notebook 3: AzureML v1, AutoML, and Transformers)

This step is running AutoML job on the training and validation datasets. Based on example in: automl-nlp-text.ipynb

Based on the evaluation metric set (in our case AUC_weighted as f1_weighted is not supported for nlp tasks) the best model found is the output of the job: AutoML Best Model

Step 3 - Test_AutoML_Model (Only in Notebook 3: AzureML v1, AutoML, and Transformers)

In this step, the best model found by AutoML is tested to calculate AUC_weighted metrics on the test dataset and compare to Hyperdrive run.

Step 4 - Register_Best_Model

In this step, the aim is to find the best model to register based on the previous HyperDrive and AutoML runs. This step focuses on the historical runs and picks up the highest performing runs based on the metric-name parameter.

Step 5 - Deploy_Latest_Model

In this step, the recently registered model is deployed as an AML Managed Endpoint. The logics deploys a new deployment with the newly registered model and if a simple test is passed, the traffic is increased to 100% and other deployments are deleted.

Todo: Add a snapshot of the Managed Endpoint