Closed jsleight closed 1 year ago
Found this was fixed in newer URI versions. E.g., doing
huggingface_model = HuggingFaceModel(
model_data=uri_path,
role=execution_role,
env=env,
sagemaker_session=sagemaker_session, image_uri="763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-cpu-py39-ubuntu20.04-v1.0",
)
Describe the bug
It seems like the model serving endpoints don't utilize the NVME drives effectively. When I try to serve a 13B parameter LLM (my model.tar.gz is ~42GB on s3) I get errors that the disk is out of space. The endpoint fails to create.
I think the root of the issue is that the endpoint is trying to put too much stuff into the
/
disk volume instead of using the NVME which is located on/tmp
.Screenshots or logs
Here's all my log events for the endpoint startup failure.
I also injected a
df -kh
call to see what the disk utilization was and got:So storing things at
/.sagemaker/...
or at/opt/ml/...
are both going to fail. It needs to be on the nvme at/tmp
System information Specifics of my requirements.txt, inference.py, and invocation code in the details.
Additional details
I've also tried to alter the
SAGEMAKER_BASE_DIR
env variable to be in/tmp
but it just gives an error about an model-dir directory.