aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

Serverless Endpoint Can't Run Due to Insufficient Space #4665

Open JamesBowerXanda opened 6 months ago

JamesBowerXanda commented 6 months ago

Describe the bug I am trying to run a serverless endpoint but the endpoint always fails to get created while trying to install dependencies. I understand that serverless endpoints do not have much space but I provisioned the full 6GB amount and it hasn't even gotten to downloading the model.

To reproduce Create a sagemaker serverless endpoint withe the following configuration:

IMAGE:

763104351884.dkr.ecr.eu-west-2.amazonaws.com/pytorch-inference:2.2.0-cpu-py310-ubuntu20.04-sagemaker

REQUIRMENTS:

torchaudio==2.2.2 sox==1.5.0 huggingface_hub>=0.8.0 hyperpyyaml>=0.0.1 joblib>=0.14.1 numpy>=1.17.0 packaging pandas>=1.0.1 pre-commit>=2.3.0 pygtrie>=2.1,<3.0 scipy>=1.4.1,<1.13.0 sentencepiece>=0.1.91 SoundFile; sys_platform == 'win32' torch>=1.9.0,<=2.2.2 tqdm>=4.42.0 transformers>=4.30.0 speechbrain==1.0.0

Alternatively you could reduce this to the following but the others will be installed as dependencies anyway:

torchaudio==2.2.2 sox==1.5.0 speechbrain==1.0.0

MEMORY:

6GB

Expected behavior Serverless endpoint is created

Screenshots or logs image

System information A description of your system. Please provide:

Additional context On my local machine a virtual environment with the packages outlined in the requirements.txt file takes 842MB

monali-2210 commented 4 months ago

@JamesBowerXanda have you resolved this issue?

JamesBowerXanda commented 4 months ago

Hi, I can't remember if I did solve it but it is no longer relevant as we realised we moved to using a real time endpoint anyway for latency reasons.