bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
https://bentoml.com
Apache License 2.0
7.13k stars 792 forks source link

bug: bentoml 1.3.10 containerize run failed but bentoml serve succeed #5051

Closed cceasy closed 5 days ago

cceasy commented 6 days ago

Describe the bug

Hi there, I have encountered a problem with bentoml containerized run, but I could successfully run by using bentoml serve.

To reproduce

I could use bentoml serve service:ModelService to run the model below, and I could build docker image by bentoml containerize --opt platform=linux/amd64 xxx, but I failed to run it with docker run --rm --platform linux/amd64 xxx, here is the error message while doing docker run:

% docker run --rm --platform linux/amd64 -v ~/.config/gcloud:/home/bentoml/.config/gcloud -p 8080:8080 -e BENTOML_PORT='8080' -e MODEL_NAME='my-keras-bert-model' -e MODEL_VERSION='v0.0.1' my-keras-bert-model:stzwfhev2o2jphsf
Traceback (most recent call last):
  File "/usr/local/bin/bentoml", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/utils.py", line 361, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/utils.py", line 332, in wrapper
    return_value = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/env_manager.py", line 126, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/serve.py", line 261, in serve
    svc = load(bento_identifier=bento, working_dir=working_dir)
  File "/usr/local/lib/python3.10/site-packages/bentoml/_internal/service/loader.py", line 369, in load
    _svc = import_1_2_service(_bento_identifier, _working_dir)
  File "/usr/local/lib/python3.10/site-packages/_bentoml_impl/loader.py", line 198, in import_service
    svc.on_load_bento(bento)
  File "/usr/local/lib/python3.10/site-packages/_bentoml_sdk/service/factory.py", line 402, in on_load_bento
    service_info = next(svc for svc in bento.info.services if svc.name == self.name)
StopIteration

Files as below:

description: "A Bento Service for my keras bert model."

python: requirements_txt: "requirements.txt" lock_packages: false

include:


- service.py

import bentoml from starlette.responses import JSONResponse from starlette.status import HTTP_500_INTERNAL_SERVER_ERROR from pydantic import RootModel from pathlib import Path import os

model_name = Path(file).parent.parent.parent.stem

class Input(RootModel[list]): pass

@bentoml.service( name=model_name, traffic={"timeout": 10}, ) class ModelService: def init(self): super().init() model_tag = "{}:{}".format(os.environ.get("MODEL_NAME"), os.environ.get("MODEL_VERSION")) self.model = bentoml.keras.load_model(model_tag)

@bentoml.on_deployment
def load_model_artifact():
    # import bentoml model from GCS to local

@bentoml.api(input_spec=Input)
def predict(self, root: list) -> list:
    # You need to modify the input pre-processing and output post-processing according to your requirements.
    try:
        input = self.preprocess(root)
        output = self.model.predict(input)
        res = self.postprocess(output)
        return res
    except Exception as e:
        print("failed to predict", str(e))
        return JSONResponse(
            content={"message": f"An error occurred during prediction: {str(e)}"},
            status_code=HTTP_500_INTERNAL_SERVER_ERROR
        )

def preprocess(self, input):
    input_feed = [x["review"] for x in input]
    print("debug model input:", input_feed)
    return input_feed

def postprocess(self, output):
    print("debug model output:", type(output), output)
    res = [
        {
            "raw_score": x[1],
            "score": x[1] * 100
        } for x in output.tolist()
    ]
    return res

- how to gen model

import os import bentoml

os.environ["KERAS_BACKEND"] = "jax" # Or "tensorflow" or "torch"

import tensorflow_datasets as tfds import keras_nlp

imdb_train, imdb_test = tfds.load( "imdb_reviews", split=["train", "test"], as_supervised=True, batch_size=16, )

Load a model.

classifier = keras_nlp.models.BertClassifier.from_preset( "bert_tiny_en_uncased", num_classes=2, activation="softmax", ) classifier.fit(imdb_train.take(250), validation_data=imdb_test.take(250))

Predict new examples.

classifier.predict(["What an amazing movie!", "A total waste of my time."])

bentoml.keras.save_model("my-test-model:v0.0.1", classifier)



### Expected behavior

_No response_

### Environment

bentoml==1.3.10
keras-nlp~=0.17.0
keras-hub-nightly==0.16.1.*
tensorflow~=2.18.0
frostming commented 6 days ago

model_name = Path(file).parent.parent.parent.stem

I think the problem is caused by this line, you get the service name dynamically from the parent directory name, but the directory won't be packed into the docker image. So when containerized, the service name is not likely the same as what is frozen in the bento. Therefore the error is explained.

You can close this issue if no other concerns.

cceasy commented 5 days ago

Thanks for the quick response @frostming