This pull request introduces the training and inference scripts essential for model development. Alongside these scripts, it includes a requirements.txt file detailing all necessary dependencies. Additionally, a supporting Dockerfile is provided to optimize batch sizes specifically for NVIDIA GPU instances, ensuring efficient utilization of GPU resources.
The following is a sample execution of the training script:
docker run --gpus '"device=0"' --rm public.ecr.aws/h2x4e7f7/batch-optimization-training:latest
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Using device: cuda
Successfully downloaded model and tokenizer.
Batch Size 1024: Out of Memory. Trying smaller batch size.
Batch Size 512: Out of Memory. Trying smaller batch size.
Batch Size 256: Out of Memory. Trying smaller batch size.
Batch Size: 128
Training time: 7.17 seconds
Throughput: 139.54 samples/second
Average GPU Utilization: 100.00%
Optimal Batch Size Found:
Batch Size: 128, Throughput: 139.54 samples/sec, GPU Utilization: 100.00%
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Description of changes:
This pull request introduces the training and inference scripts essential for model development. Alongside these scripts, it includes a
requirements.txt
file detailing all necessary dependencies. Additionally, a supportingDockerfile
is provided to optimize batch sizes specifically for NVIDIA GPU instances, ensuring efficient utilization of GPU resources.The following is a sample execution of the training script:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.