bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
https://bentoml.com
Apache License 2.0
7.1k stars 789 forks source link

Pypi and Transformers artifacts-framework #1092

Closed rockdrigoma closed 4 years ago

rockdrigoma commented 4 years ago

Is your feature request related to a problem? Please describe. Is the new version stable in order to update pypi and use transformers? I want to deploy a Transformer model from HuggingFace but there's no transformers artifact, transformers is only implemented in new versions as a framework and there's no documentation on that as well.

parano commented 4 years ago

Hi @rockdrigoma - the transformer implementation was just merged today here https://github.com/bentoml/BentoML/pull/1090

and there are some issue displaying the document, but you can find it in the code: https://github.com/bentoml/BentoML/blob/master/bentoml/frameworks/transformers.py#L33

I will do some more testing on that, and a new release should come out in the next few days.

rockdrigoma commented 4 years ago

I already checked the code, but still a little bit hard for me to understand it. I tried to use this:

xample usage:

import bentoml from bentoml.adapters import JsonInput from bentoml.frameworks.transformers import TransformersModelArtifact

Example : text generation using GPT2

Explicitly add either torch or tensorflow dependency

which will be used by transformers

@bentoml.env(pip_packages=["torch==1.6", "transformers==3.1"]) @bentoml.artifacts([TransformersModelArtifact('gptModel')]) class TransformersService(bentoml.BentoService): @bentoml.api(input=JsonInput()) def predict(self, parsed_json): src_text = parsed_json[0].get("text") model = self.artifacts.gptModel.get("model") tokenizer = self.artifacts.gptModel.get("tokenizer") input_ids = tokenizer.encode(src_text, return_tensors="pt") output = model.generate(input_ids, max_length=50) output = [tokenizer.decode(output[0], skip_special_tokens=True)] return output

svc = TransformersService() ts = TransformersGPT2TextGenerator() tokenizer = AutoTokenizer.from_pretrained("gpt2") model = AutoModelWithLMHead.from_pretrained( "gpt2", pad_token_id=tokenizer.eos_token_id) ts = TransformersGPT2TextGenerator() ts.pack("gptModel", {"model": model, "tokenizer": tokenizer})

OR - Directly pack the model by providing its name

ts = ts.pack('gptModel','gpt2')

Note that while packing using the name of the model,

ensure that the model can be loaded using

transformers.AutoModelWithLMHead (eg GPT, Bert, Roberta etc.)

If this is not the case (eg AutoModelForQuestionAnswering, BartModel etc)

then pack the model by passing a dictionary

with the model and tokenizer declared explicitly

saved_path = ts.save() """

But it seems a little bit redundant and my model is throwing an error: dict has no predict attribute

parano commented 4 years ago

@rockdrigoma could you share your code for me to reproduce the issue?

rockdrigoma commented 4 years ago

In the first cell I have:

%%writefile transformers_service.py
import pandas as pd

import bentoml
from bentoml.adapters import JsonInput
from bentoml import env, artifacts, api, BentoService
from bentoml.frameworks.transformers import TransformersModelArtifact

@env(infer_pip_packages=True)

@bentoml.artifacts([TransformersModelArtifact('marian')])
class TransformersService(bentoml.BentoService):
  @bentoml.api(input=JsonInput())
  def predict(self, parsed_json):
    src_text = parsed_json[0].get("text")
    src_text = [f'>>es<< {src_text}']
    model = self.artifacts.marian.get('model')
    tokenizer = self.artifacts.marian.get('tokenizer')
    output = model.generate(**tokenizer.prepare_seq2seq_batch(src_text))
    tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in output]
    return tgt_text

In the next one:

from transformers_service import TransformersService

# Create a classifier service instance
ts = TransformersService()

from transformers import MarianMTModel, MarianTokenizer
model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Pack the newly trained model artifact
ts.pack('marian', {'model': model, "tokenizer": tokenizer})

# Save the prediction service to disk for model serving
saved_path = ts.save()

And finally to deploy the service:

!bentoml serve TransformersService:latest --run-with-ngrok

And then doing a test request on:

http://127.0.0.1:5000/predict

parano commented 4 years ago

@rockdrigoma did you find out why it didn't work before? sorry I haven't had a chance to look into the issue yet

rockdrigoma commented 4 years ago

@rockdrigoma did you find out why it didn't work before? sorry I haven't had a chance to look into the issue yet

Not really, I had to delete the transformers_service.py file and restart the environment on Google Colab for it to work. Now it is working.