bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
https://bentoml.com
Apache License 2.0
7.08k stars 786 forks source link

TransformersModelArtifact fails to be loaded on to GPU #1855

Closed caffeinetoomuch closed 2 years ago

caffeinetoomuch commented 3 years ago

I am trying to serve BentoService with GPU by dockerizing it. However, my docker container fails to load model on to GPU. I am using TransformersModelArtifact to save and load the model. Docker container runs fine and even handles incoming requests, but model still does not run on GPU. I was able to access GPU inside docker container, though, so it is definitely not a docker issue. There was not any error statements in docker logs.

Service definition:

@bentoml.env(
     conda_dependencies=["pytorch", "cudatoolkit=11.1"],
     conda_channels=["pytorch", "nvidia"],
     pip_packages=["transformers", "sentencepiece"],
     docker_base_image="bentoml/model-server:0.13.1-py38",
 )
 @bentoml.artifacts([TransformersModelArtifact("t5model")])
 class TestService(bentoml.BentoService):

Packing script:

device = "cuda" if torch.cuda.is_available() else "cpu"
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
tokenizer = T5Tokenizer.from_pretrained(model_name_or_path)

model.to("cuda")
model.eval()

bento_svc = TestService()
bento_svc.pack("t5model", {"model": model, "tokenizer": tokenizer})
bento_svc.save_to_dir("/workspace/bentoml/t5model")

Docker run commands

docker run -d \
    --name models-staging \
    --restart=unless-stopped \
    --gpus '"device=2"' --device /dev/nvidia2 \
    --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools \
    --device /dev/nvidia-modeset --device /dev/nvidiactl \
    -p 5002:5000 \
    -v /home/myhome/bentoml_configuration.yml:/home/bentoml/configuration.yml \
    -e BENTOML_CONFIG=/home/bentoml/configuration.yml \
    test-models:staging \
    --workers=1

Environment:

parano commented 2 years ago

Hi @ice-americano - looks like you are not using a base image that has docker supporting, the image bentoml/model-server:0.13.1-py38 actually does not contain all the cuda and cudnn dependencies that's required. Could you try using the 0.13.1-py38-gpu image instead and try again?

aarnphm commented 2 years ago

Hi @ice-americano, can you try this again on release 1.0.0a2 ?

aarnphm commented 2 years ago

Hi @ice-americano, feel free to try out our rc releases with pip install -U --pre bentoml. I hope this issue should be addressed in recent release of BentoML.

If it is required for you to stay at 0.13, We will come back to this after 1.0 release is out.

ssheng commented 2 years ago

BentoML has released official 1.0.0 with Hugging Face Transformers support. Could you please give it a try? Let us know if this problem persists.