bug: serving model requires attribute "contrastive_search" which is not exist

dShvetsov commented 1 year ago

Describe the bug

Trying to use transformer model produces the following error.

2023-07-19T17:47:46+0200 [ERROR] [runner:nnew_lite_toxicity_model:1] Traceback (most recent call last):
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/starlette/routing.py", line 635, in lifespan
    async with self.lifespan_context(app):
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/contextlib.py", line 175, in __aenter__
    return await self.gen.__anext__()
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
    on_startup()
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 312, in init_local
    raise e
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 302, in init_local
    self._set_handle(LocalRunnerRef)
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/bentoml/_internal/runner/runner.py", line 145, in _set_handle
    runner_handle = handle_class(self, *args, **kwargs)
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 24, in __init__
    self._runnable = runner.runnable_class(**runner.runnable_init_params)  # type: ignore
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/bentoml/_internal/frameworks/transformers.py", line 924, in __init__
    self.predict_fns[method_name] = getattr(self.model, method_name)
  File "/Users/dshvetsov/opt/miniconda3/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1265, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'AlbertForSequenceClassification' object has no attribute 'contrastive_search'

Also there is a warning on the starting

2023-07-19T17:47:35+0200 [WARNING] [runner:nnew_lite_toxicity_tokenizer:1] Current Model(tag="nnew_lite_toxicity_tokenizer:rrertibgjoyaxaw5") is saved with an older version of BentoML. Setting GPUs on this won't work as expected. Make sure to save it with a newer version of BentoML.

But I'm saved the model right before serving, so version of BentoML must be the same. I don't know if it is related

To reproduce

Saving model:

        checkpoint = 'pykeio/lite-toxic-comment-classification'

        tokenizer = AutoTokenizer.from_pretrained(checkpoint)
        tokenizer.model_max_length = 512  # For some reason for the current checkpoint trunction did't work
        model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
        bentoml.transformers.save_model('nnew_lite_toxicity_tokenizer', tokenizer)
        bentoml.transformers.save_model('nnew_lite_toxicity_model', model)

Serving

import bentoml
from bentoml.io import Text, NumpyNdarray
tokenizer_runner = bentoml.transformers.get('nnew_lite_toxicity_tokenizer').to_runner()
model_runner = bentoml.transformers.get('nnew_lite_toxicity_model').to_runner()

svc = bentoml.Service('lite_toxicity', runners=[tokenizer_runner, model_runner])

@svc.api(input=Text(), output=NumpyNdarray())
def toxicity(inp: str):
    inputs = tokenizer_runner.run(text=inp, truncation=True, padding=True, return_tensors='pt')
    result = model_runner.run(**inputs).logits().sigmoid()
    return result.detach().cpu().numpuy()

Expected behavior

No error appearing and model serving

Environment

bentoml, version 1.0.24 Python 3.9.5 Platform: macOs 13.4.1 (22F82)

trongnghia05 commented 1 year ago

did you solve it, i have the same error

dShvetsov commented 1 year ago

I found a workaround by defining a custom Runner

tokenizer_ref = bentoml.transformers.get('lite_toxicity_tokenizer')
model_ref = bentoml.transformers.get('lite_toxicity_model')

class ToxicityRunnable(bentoml.Runnable):

    SUPPORTED_RESOURCES = ('nvidia.com/gp', 'cpu')
    SUPPORTS_CPU_MULTI_THREADING = True

    def __init__(self):
        self.tokenizer = bentoml.transformers.load_model(tokenizer_ref)
        self.model = bentoml.transformers.load_model(model_ref)

    @bentoml.Runnable.method(batchable=False)
    def toxicity(self, inp: List[str]):
        inputs = self.tokenizer(inp, truncation=True, padding=True, return_tensors='pt')
        result = self.model(**inputs).logits.sigmoid()  # noqa
        return result.detach().cpu().numpy()

and then connected it to service

toxicity_runner = bentoml.Runner(ToxicityRunnable, name='toxicity', models=[tokenizer_ref, model_ref])

svc = bentoml.Service('server', runners=[toxicity_runner])

@svc.api(input=JSON(), output=NumpyNdarray())
async def toxicity(inp: str):
    return await toxicity_runner.toxicity.async_run(inp['texts'])

aarnphm commented 1 year ago

What version of transformers is this?

aarnphm commented 1 year ago

constrative_search should already be a signature of generationmixin for the model. Unless this is a custom model that doesn't support GenerationMixin, then you might need to use custom runners for now.

dShvetsov commented 1 year ago

Checked now with versions

transformers==4.31.0 bentoml==1.0.24

And it works

Probably last time transformers package was outdated. Thank you

bentoml / BentoML