TransformersModelArtifact fails to be loaded on to GPU

caffeinetoomuch commented 3 years ago

I am trying to serve BentoService with GPU by dockerizing it. However, my docker container fails to load model on to GPU. I am using TransformersModelArtifact to save and load the model. Docker container runs fine and even handles incoming requests, but model still does not run on GPU. I was able to access GPU inside docker container, though, so it is definitely not a docker issue. There was not any error statements in docker logs.

Service definition:

@bentoml.env(
     conda_dependencies=["pytorch", "cudatoolkit=11.1"],
     conda_channels=["pytorch", "nvidia"],
     pip_packages=["transformers", "sentencepiece"],
     docker_base_image="bentoml/model-server:0.13.1-py38",
 )
 @bentoml.artifacts([TransformersModelArtifact("t5model")])
 class TestService(bentoml.BentoService):

Packing script:

device = "cuda" if torch.cuda.is_available() else "cpu"
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
tokenizer = T5Tokenizer.from_pretrained(model_name_or_path)

model.to("cuda")
model.eval()

bento_svc = TestService()
bento_svc.pack("t5model", {"model": model, "tokenizer": tokenizer})
bento_svc.save_to_dir("/workspace/bentoml/t5model")

Docker run commands

docker run -d \
    --name models-staging \
    --restart=unless-stopped \
    --gpus '"device=2"' --device /dev/nvidia2 \
    --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools \
    --device /dev/nvidia-modeset --device /dev/nvidiactl \
    -p 5002:5000 \
    -v /home/myhome/bentoml_configuration.yml:/home/bentoml/configuration.yml \
    -e BENTOML_CONFIG=/home/bentoml/configuration.yml \
    test-models:staging \
    --workers=1

Environment:

Ubuntu 18.04
Python 3.8.10
BentoML 0.13.1
Docker 20.10.6

parano commented 2 years ago

Hi @ice-americano - looks like you are not using a base image that has docker supporting, the image bentoml/model-server:0.13.1-py38 actually does not contain all the cuda and cudnn dependencies that's required. Could you try using the 0.13.1-py38-gpu image instead and try again?

aarnphm commented 2 years ago

Hi @ice-americano, can you try this again on release 1.0.0a2 ?

aarnphm commented 2 years ago

Hi @ice-americano, feel free to try out our rc releases with pip install -U --pre bentoml. I hope this issue should be addressed in recent release of BentoML.

If it is required for you to stay at 0.13, We will come back to this after 1.0 release is out.

ssheng commented 2 years ago

BentoML has released official 1.0.0 with Hugging Face Transformers support. Could you please give it a try? Let us know if this problem persists.

bentoml / BentoML

TransformersModelArtifact fails to be loaded on to GPU #1855