HF_MODEL_ID is a must, and I have set the S3 path for model_data, but it always downloads model files from remote hunggingface when I want to deploy the Sagemaker endpoint in AWS.
import json
from sagemaker.huggingface import HuggingFaceModel
# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300
# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID':'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(2000),
'MAX_TOTAL_TOKENS': json.dumps(2048),
}
# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
model_data="s3://S3_PATH/oasst-sft-4-pythia-12b-epoch-3.5.tar.gz",
role=role,
image_uri=llm_image,
env=config
)
llm = llm_model.deploy(
endpoint_name="oasst-sft-4-pythia-12b-epoch-35-12x",
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)
Expected behavior:
I can use the model file on AWS S3 without remote hunggingface.
Concise Description:
How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?
DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04
Current behavior:
HF_MODEL_ID is a must, and I have set the S3 path for model_data, but it always downloads model files from remote hunggingface when I want to deploy the Sagemaker endpoint in AWS.
Expected behavior:
I can use the model file on AWS S3 without remote hunggingface.