Add Batch Optimization Scripts for NVIDIA Instances

Description of changes:

This pull request introduces the training and inference scripts essential for model development. Alongside these scripts, it includes a requirements.txt file detailing all necessary dependencies. Additionally, a supporting Dockerfile is provided to optimize batch sizes specifically for NVIDIA GPU instances, ensuring efficient utilization of GPU resources.

The following is a sample execution of the training script:

docker run --gpus '"device=0"' --rm public.ecr.aws/h2x4e7f7/batch-optimization-training:latest

/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Using device: cuda
Successfully downloaded model and tokenizer.
Batch Size 1024: Out of Memory. Trying smaller batch size.
Batch Size 512: Out of Memory. Trying smaller batch size.
Batch Size 256: Out of Memory. Trying smaller batch size.
Batch Size: 128
Training time: 7.17 seconds
Throughput: 139.54 samples/second
Average GPU Utilization: 100.00%

Optimal Batch Size Found:
Batch Size: 128, Throughput: 139.54 samples/sec, GPU Utilization: 100.00%

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

aws / aws-k8s-tester

Add Batch Optimization Scripts for NVIDIA Instances #498