enable_sequential_cpu_offload HuggingFace Diffusers error with sd2 example on T4 GPU

BEpresent commented 1 year ago

Hi, I was following this example https://modelserving.com/blog/creating-stable-diffusion-20-service-with-bentoml-and-diffusers

or this by git clone of this example repo https://github.com/bentoml/diffusers-examples/tree/main/sd2

which results in a simple service.py file like this:

import torch
from diffusers import StableDiffusionPipeline

import bentoml
from bentoml.io import Image, JSON, Multipart

bento_model = bentoml.diffusers.get("sd2:latest")
stable_diffusion_runner = bento_model.to_runner()

svc = bentoml.Service("stable_diffusion_v2", runners=[stable_diffusion_runner])

@svc.api(input=JSON(), output=Image())
def txt2img(input_data):
    images, _ = stable_diffusion_runner.run(**input_data)
    return images[0]

After bentoml serve service:svc --production I get the following error (happens also with another custom model that I tried). It seems to be related to enable_sequential_cpu_offload by HuggingFace.

[ERROR] [runner:sd2:1] Traceback (most recent call last):
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 671, in lifespan
    async with self.lifespan_context(app):
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 566, in __aenter__
    await self._router.startup()
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/starlette/routing.py", line 650, in startup
    handler()
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 303, in init_local
    raise e
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 293, in init_local
    self._set_handle(LocalRunnerRef)
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 139, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 24, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/frameworks/diffusers.py", line 443, in __init__
    self.pipeline: diffusers.DiffusionPipeline = load_model(
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/bentoml/_internal/frameworks/diffusers.py", line 182, in load_model
    pipeline = pipeline.to(device_id)
  File "/home/be/miniconda3/envs/diffusers310/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 639, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.

As general info, it runs on a GCS VM instance with T4 GPU - could this be the issue?

BEpresent commented 1 year ago

update, also happening on a 3090 GPU

2023-04-13T20:05:33+0000 [ERROR] [runner:sd2:1] Application startup failed. Exiting.
/usr/local/lib/python3.10/dist-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
2023-04-13T20:05:41+0000 [ERROR] [runner:sd2:1] An exception occurred while instantiating runner 'sd2', see details below:
2023-04-13T20:05:41+0000 [ERROR] [runner:sd2:1] Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/bentoml/_internal/runner/runner.py", line 293, in init_local
    self._set_handle(LocalRunnerRef)
  File "/usr/local/lib/python3.10/dist-packages/bentoml/_internal/runner/runner.py", line 139, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/bentoml/_internal/runner/runner_handle/local.py", line 24, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/usr/local/lib/python3.10/dist-packages/bentoml/_internal/frameworks/diffusers.py", line 443, in __init__
    self.pipeline: diffusers.DiffusionPipeline = load_model(
  File "/usr/local/lib/python3.10/dist-packages/bentoml/_internal/frameworks/diffusers.py", line 182, in load_model
    pipeline = pipeline.to(device_id)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/pipeline_utils.py", line 626, in to
    raise ValueError(
ValueError: It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading.

larme commented 1 year ago

Hi @BEpresent I think there's a diffusers update breaking bentoml.diffusers. We are going to fix this one. You can lock diffusers==0.13.1 for a temporal fix

bentoml / diffusers-examples

enable_sequential_cpu_offload HuggingFace Diffusers error with sd2 example on T4 GPU #2