Closed dShvetsov closed 1 year ago
did you solve it, i have the same error
I found a workaround by defining a custom Runner
tokenizer_ref = bentoml.transformers.get('lite_toxicity_tokenizer')
model_ref = bentoml.transformers.get('lite_toxicity_model')
class ToxicityRunnable(bentoml.Runnable):
SUPPORTED_RESOURCES = ('nvidia.com/gp', 'cpu')
SUPPORTS_CPU_MULTI_THREADING = True
def __init__(self):
self.tokenizer = bentoml.transformers.load_model(tokenizer_ref)
self.model = bentoml.transformers.load_model(model_ref)
@bentoml.Runnable.method(batchable=False)
def toxicity(self, inp: List[str]):
inputs = self.tokenizer(inp, truncation=True, padding=True, return_tensors='pt')
result = self.model(**inputs).logits.sigmoid() # noqa
return result.detach().cpu().numpy()
and then connected it to service
toxicity_runner = bentoml.Runner(ToxicityRunnable, name='toxicity', models=[tokenizer_ref, model_ref])
svc = bentoml.Service('server', runners=[toxicity_runner])
@svc.api(input=JSON(), output=NumpyNdarray())
async def toxicity(inp: str):
return await toxicity_runner.toxicity.async_run(inp['texts'])
What version of transformers is this?
constrative_search
should already be a signature of generationmixin for the model. Unless this is a custom model that doesn't support GenerationMixin
, then you might need to use custom runners for now.
Checked now with versions
transformers==4.31.0 bentoml==1.0.24
And it works
Probably last time transformers package was outdated. Thank you
Describe the bug
Trying to use transformer model produces the following error.
Also there is a warning on the starting
But I'm saved the model right before serving, so version of BentoML must be the same. I don't know if it is related
To reproduce
Saving model:
Serving
Expected behavior
No error appearing and model serving
Environment
bentoml, version 1.0.24 Python 3.9.5 Platform: macOs 13.4.1 (22F82)