aws / sagemaker-huggingface-inference-toolkit

Apache License 2.0
240 stars 60 forks source link

How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker? #97

Closed weiZhenkun closed 1 year ago

weiZhenkun commented 1 year ago

Concise Description:

How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?

DLC image/dockerfile:

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04

Current behavior:

HF_MODEL_ID is a must, and I have set the S3 path for model_data, but it always downloads model files from remote hunggingface when I want to deploy the Sagemaker endpoint in AWS.

import json
from sagemaker.huggingface import HuggingFaceModel

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300

# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID':'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(2000),
  'MAX_TOTAL_TOKENS': json.dumps(2048),
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  model_data="s3://S3_PATH/oasst-sft-4-pythia-12b-epoch-3.5.tar.gz",
  role=role,
  image_uri=llm_image,
  env=config
)

llm = llm_model.deploy(
  endpoint_name="oasst-sft-4-pythia-12b-epoch-35-12x",
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

Expected behavior:

I can use the model file on AWS S3 without remote hunggingface.

philschmid commented 1 year ago

You can take a look at this example: https://www.philschmid.de/sagemaker-llm-vpc

weiZhenkun commented 1 year ago

You can take a look at this example: https://www.philschmid.de/sagemaker-llm-vpc

OK, thanks