bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.15k stars 792 forks source link

bug: asyncio.exceptions.CancelledError #4318

Closed cacbondioxit closed 10 months ago

cacbondioxit commented 11 months ago

Describe the bug

I was trying to serve MusicGen huggingface model with BentoML. the input of API was json, output was multipart with np.array and text. but it makes something mysterious, which was asyncio.exceptions.CancelledError. It's difficult for me to figure it out what's going on inside, so I report this issue.

To reproduce

Let me show you my codes:

# model_save.py
import bentoml
from transformers import AutoProcessor, MusicgenForConditionalGeneration

processor = AutoProcessor.from_pretrained("facebook/musicgen-large")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-large")

bentoml.transformers.save_model("beatsomeone-autoprocessor", processor)
bentoml.transformers.save_model("beatsomeone-musicgen", model)
# service.py
import sys
sys.path.append("..")

import bentoml
import torch
from bentoml.io import JSON, NumpyNdarray, Text, Multipart
from prompt_generator import PromptGenerator
from cuda_memory import get_sorted  # Custom Module

processor = bentoml.transformers.get("beatsomeone-autoprocessor")
model = bentoml.transformers.get("beatsomeone-musicgen")

class MusicGenerationRunnable(bentoml.Runnable):
    SUPPORTED_RESOURCES = ("cuda:1", "cuda:2", "cuda:3", "cpu", )
    SUPPORTS_CPU_MULTI_THREADING = True

    def __init__(self):
        self.device = f"cuda:{[num for num in get_sorted() if int(num)!=0][0]}" if torch.cuda.is_available() else "cpu"
        self.music_generator = bentoml.transformers.load_model(model).to(self.device)
        self.processor = bentoml.transformers.load_model(processor)

    def refresh_device(self):
        self.device = f"cuda:{[num for num in get_sorted() if int(num)!=0][0]}" if torch.cuda.is_available() else "cpu"
        self.music_generator = self.music_generator.to(self.device)

    @bentoml.Runnable.method(batchable=False)
    def generate(self, prompt: str, length: int=10):
        self.refresh_device() 
        inputs = self.processor(text=[prompt],
                                padding=True,
                                return_tensors="pt").to(self.device)

        audio_values = self.music_generator.generate(**inputs, max_new_tokens=256*(length//5))
        sampling_rate = self.music_generator.config.audio_encoder.sampling_rate

        return (audio_values.cpu(), sampling_rate)

musicgen_runner = bentoml.Runner(MusicGenerationRunnable, models=[processor, model])
svc = bentoml.Service("beatsomeone", runners=[musicgen_runner])
output_spec = Multipart(audio=NumpyNdarray(), sample_rate=NumpyNdarray(), dtype=Text())

@svc.api(input=JSON(), output=output_spec) 
def generate_music(api_input):
    # api_input = {'length': length, 'emotion': emotion, 'genre': genre, 'image': str_image, 'randomness': randomness}
    length = api_input['length']
    emotion = api_input['emotion']
    genre = api_input['genre']
    image = api_input['image']
    randomness = api_input['randomness']

    pg = PromptGenerator(emotion=emotion, 
                         genre=genre,
                         str_image=image)
    try:
        prompt = pg.generate_prompt(randomness=randomness)
        audio_values, sample_rate = musicgen_runner.generate.run(prompt=prompt, length=length) # CancelledError occurs here!

        audio = audio_values[0].numpy()
        dtype = str(audio.dtype)
        return {'audio': audio, 'sample_rate': sample_rate, 'dtype': dtype} 

    except:
        raise Error()
# request.py

import requests
import base64
import json
import time
import cv2

img_path = "***.jpg"
img = cv2.imread(img_path)
_, img_encoded = cv2.imencode('.jpg', img)
jpg_as_text = base64.b64encode(img_encoded).decode()

dict = {'length': 10, 'emotion': "Victory", 'genre': "Rock", 'image': jpg_as_text, 'randomness': True}

start = time.time()
resp = requests.post("http://218.38.14.20:3000/generate_music/", data=json.dumps(dict))
end = time.time()

print(f'{end-start}s')
print(resp.content)

And, Let me show you the traceback I've got. I intentionally hid my directory names with **.

Traceback (most recent call last):
  File "/home/ubuntu/workspace/hyungyu/******/******/service.py", line 57, in generate_music
    audio_values, sample_rate = musicgen_runner.generate.run(prompt=prompt, length=length)
  File "/home/ubuntu/anaconda3/envs/baihat/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 52, in run
    return self.runner._runner_handle.run_method(self, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/baihat/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 355, in run_method
    anyio.from_thread.run(
  File "/home/ubuntu/anaconda3/envs/baihat/lib/python3.8/site-packages/anyio/from_thread.py", line 48, in run
    return async_backend.run_async_from_thread(func, args, token=token)
  File "/home/ubuntu/anaconda3/envs/baihat/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2143, in run_async_from_thread
    return f.result()
  File "/home/ubuntu/anaconda3/envs/baihat/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/ubuntu/anaconda3/envs/baihat/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
asyncio.exceptions.CancelledError

Expected behavior

My code isn't that complicated, and it makes it difficult to find why this error happens. anyway I only need to get some np.array and text from MusicGen model. Thank you.

Environment

bentoml: 1.1.10 python: 3.8.18 transformers: 4.35.2

aarnphm commented 11 months ago

Can you show the full trace here? it is very hard to debug with just this logs.

cacbondioxit commented 10 months ago

I'm so sorry, I can't reproduce the exactly same error now, for some reason I can't explain why. I will reopen the issue later when I face similar thing again. I'm so sorry for bothering you.

aarnphm commented 10 months ago

no worries. Please do let me know if there are anything I can help you with.

cacbondioxit commented 10 months ago

Thank you so much!