This repository contains three Jupyter notebooks that demonstrate end-to-end an NLP pipeline using AzureML v1, AzureML v2, and AzureML v1 with AutoML and Transformers, respectively. All notebooks use HyperDrive for hyperparameter tuning, Managed Online Endpoint for model training and deployment, and Amazon product reviews in the Automotive category as the dataset.
The aim is to provide an example to fine-tune a custom NLP model using native AML capabilities including HyperDrive to tune the hyperparameters, AutoML job to compare results and Managed Endpoint to deploy the model. This version is developed using AML V1 although the deployment of the model to Managed Endpoints is developed using SDK V2.
This work leverages AML Pipelines V1, AML Pipelines V2, AutoML and Managed Endpoints. You can bring in several model types, but for this example, we used Bert Based Cased.
This notebook demonstrates an end-to-end text classification (sentiment analysis) process using AzureML v1 and Transformers. The process includes:
This notebook represents the end-to-end process of training, tuning, registering, and deploying a sentiment analysis model using AzureML SDK v2 and Transformers. The steps executed in this pipeline are similar to those in Notebook 1 but make use of the newer AzureML SDK v2.
Todo: Pipeline image v2
This notebook demonstrates an end-to-end machine learning process using AzureML v1, AutoML, and Transformers. The process includes:
This notebook provides a comprehensive example of using AzureML to automate the machine learning process, from data preparation to model deployment. It showcases the powerful capabilities of the platform, such as HyperDrive for hyperparameter tuning and AutoML for model selection, and enables seamless integration of various pipeline steps.
STANDARD_NC24ADS_A100_V4
series, instead.Storage Blob Owner
- Workspace storage account scopeAzure ML Datascientist
- Workspace scope (for Pipeline creating Managed Endpoints)The dataset is the Amazon Review dataset from UCSD archive to predict sentiment of the reviews from ratings. For this sample, the Automotive dataset is chosen which has around 19K samples. DefinePipeline.ipynb notebook contains the steps to download and prepare the dataset.
This work is using AML Pipelines for easier operationalization of the workload. It consists of five steps which includes
This step is a HyperDrive step which tunes a HuggingFace Transformer Bert Base Cased. The parameters of the step can help increasing the trials and test different combinations of hyperparameters to get to the best model.
The most important parameters are learning-rate
and epochs
. During tests, we learned that learning-rate
5.5e-5 can on this dataset. The epoch
value is recommended to be set to 3 or 4 based on the Bert paper.
In this step, we utilized the nvitop Python package to monitor and collect GPU resource utilization during the fine-tuning of the Transformers. Nvitop
is a powerful tool designed for NVIDIA GPUs that provides real-time visualizations of GPU usage and statistics, similar to what htop
provides for CPUs. By using nvitop
, we were able to track GPU metrics such as GPU temperature, GPU utilization, memory utilization, and more. This allowed us to ensure that the training process was efficiently using the available resources, and to identify potential bottlenecks or issues in real-time. Understanding these metrics is key to optimizing the fine-tuning process and ensuring that the model training is both effective and efficient.
Here's an example of how to use the nvitop
package to monitor your GPU utilization:
from nvitop import ResourceMetricCollector
....
....
if collect_resource_utilization == 1:
collector = ResourceMetricCollector(interval=resource_utilization_interval)
daemon = collector.daemonize(on_collect, interval=None, tag="")
def on_collect(metrics):
.....
if run:
run.log_row(f"{gpu_name} utilization", **perc_vals)
run.log_row(f"{gpu_name} memory", **mem_vals)
.....
This step is running AutoML job on the training and validation datasets. Based on example in: automl-nlp-text.ipynb
Based on the evaluation metric set (in our case AUC_weighted
as f1_weighted
is not supported for nlp tasks) the best model found is the output of the job:
In this step, the best model found by AutoML is tested to calculate AUC_weighted
metrics on the test dataset and compare to Hyperdrive run.
In this step, the aim is to find the best model to register based on the previous HyperDrive and AutoML runs. This step focuses on the historical runs and picks up the highest performing runs based on the metric-name
parameter.
In this step, the recently registered model is deployed as an AML Managed Endpoint
. The logics deploys a new deployment
with the newly registered model and if a simple test is passed, the traffic is increased to 100% and other deployments are deleted.
Todo: Add a snapshot of the Managed Endpoint