Open ZayTheory opened 3 years ago
Hey @ZayTheory,
Can you share the structure of your model.tar.gz
?
And try it again be creating the archive with the following steps?
Download the model
git lfs install
git clone https://huggingface.co/sshleifer/distilbart-cnn-12-6
Create a tar file
cd distilbart-cnn-12-6
tar zcvf model.tar.gz *
Upload model.tar.gz to s3
aws s3 cp model.tar.gz <s3://mymodel>
You could also deploy the model without creating an archive and uploading it to s3 with the following snippet
from sagemaker.huggingface import HuggingFaceModel
import boto3
iam_client = boto3.client('iam')
role = iam_client.get_role(RoleName='{IAM_ROLE_WITH_SAGEMAKER_PERMISSIONS}')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'sshleifer/distilbart-cnn-12-6',
'HF_TASK':'summarization'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
predictor.predict({
'inputs': "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
})
Thank you for your tips! I followed your steps and it uploaded great but now I have ran into a new issue! I think the issue previously was that I was compressing the folder and not each individual file. That was causing directory problems. However, now my model works on aws, but it is not utilitzing the GPUs. The CPU takes a long time to generate summaries so a GPU-based inference is necessary. (Its the difference between 1 second and 16 seconds per inference) My goal is to create a custom inference.py file to truncate the inputs at the endpoint so I don't result in a CUDA assert trigger given too large of an input. Here is my inference.py file:
What edits should I make to this code to detect and utilize the GPU on my aws instance?
I also thought that I should just not make a custom load function given that it seems the default load function takes care of hooking up the GPU alrady:
but the problem I'm running into is figuring out how to access the tokenizer in either a custom predict_fn() or preprocess_fn()
Thanks so much for your help! without you I would have been stuck for another week!
Hey @ZayTheory,
I tried to extract your questions and answer them independently:
Putting a Model in GPU:
from_pretrained
to need to put the model as you normally would do on the GPU, you can look at this forum post https://discuss.huggingface.co/t/is-transformers-using-gpu-by-default/8500transofrmers_utils.py
called _is_gpu_available
, which you could use to detect if the code is running on a GPU. https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/ca724995f11b58713efdc76b25b873ab1bff0ea8/src/sagemaker_huggingface_inference_toolkit/transformers_utils.py#L93Truncate input:
infernece.py
to truncate you could add the parameters
key into your request json
, like that. All of the parameters
are passed into the pipeline
running in the inference toolkit.predictor.predict({
"inputs": long_input,
"parameters": {
"truncation":True
}
})
infernece.py
I would recommend only to override the predict
function and truncate the input there. The load()
returns a transformers.pipeline
and you can access the tokenizers
of it with model.tokenizers
. P.S. You can also use the parameters
to control, e.g. max_length
.
My case is a bit different. I've a boto3 client to invoke the endpoint.
client = boto3.client('sagemaker-runtime')
I tried to add truncation parameter in the request. It didn't work.
request- s = {"inputs": long_sentences, "parameters": {"truncation":True}} b = json.dumps(s).encode('utf-8') payload = b
content_type = "application/json"
response = client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=content_type,
Accept=accept,
Body=payload,
)
response- ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{ "code": 400, "type": "InternalServerException", "message": "The size of tensor a (577) must match the size of tensor b (512) at non-singleton dimension 1" }
Hey @miniweeds
could please share more information like which model you have deployed, how you have created your endpoint etc.
This should work you can also check out this forum thread where a different user had the same issue: https://discuss.huggingface.co/t/how-are-the-inputs-tokenized-when-model-deployment/9692/5 Feel free to open your own thread in the forum with more information: https://discuss.huggingface.co/c/sagemaker/17
Ok, so I am trying to deploy this model : https://huggingface.co/sshleifer/distilbart-cnn-12-6 with a custom inference.py to an endpoint on Amazon Web Services. First, however, I am trying to deploy it as is to before I add the custom inference.py file.
I will step you through the steps I have taken so far in hopes that you can tell me what I am doing wrong.
1) Download the model files using git clone https://huggingface.co/sshleifer/distilbart-cnn-12-6
2)Compress the 5.2gb model into a model.tar.gz using the command 'tar -czf model.tar.gz distilbart-cnn-12-6'
3)Upload the model.tar.gz to my s3 bucket
4)Deploy my model using this script
Whenever I run this, the endpoint successfully deploys. However when I try to run a prediction using
predictor.predict({ 'inputs': "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct." })
I get an error telling me on my aws logs
2021-07-22 16:02:37,654 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ValueError: ("You need to define one of the following ['feature-extraction', 'text-classification', 'token-classification', 'question-answering', 'table-question-answering', 'fill-mask', 'summarization', 'translation', 'text2text-generation', 'text-generation', 'zero-shot-classification', 'conversational', 'image-classification'] as env 'TASK'.", 403)
So I went into the source code here and found this section:
So for some reason, my json.config file is not loading. I think it has something to do with the model directory not being in the right place and I am kind of lost. Any help would be much appreciated!!!