bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.17k stars 792 forks source link

BentoService does not package local dependencies beyond the top level directory #1410

Closed Toukenize closed 2 years ago

Toukenize commented 3 years ago

Describe the bug

BentoService does not package local dependencies with more than 1 level of directory without adding __init__.py in all sub-directories.

This is my project structure

root
├── serve.py
├── src
│   ├── utils
│   │   ├── preprocess.py
│   │   └── constant.py
│   ├── deploy
│   │   └── classifier.py
│   └── __init__.py
└── serve.py

BentoML only bundled these (and classifier.py is moved out):

├── src
│   └── __init__.py
├── classifier.py
└ ...

These are what my classifier.py and serve.py do:

And I basically define the BentoService in classifier.py

import bentoml
from bentoml.frameworks.transformers import TransformersModelArtifact
from bentoml.adapters import JsonInput

import sys
from os import path

sys.path.append(path.dirname(path.abspath(__file__)))

from src.utils.constants import stopword_list
from src.utils.preprocess import preprocess_text

@bentomk.env(...)
@bentoml.artifacts(...)
class BentoClassifier(bentoml.BentoService):

    def preprocess(self, text):
        text = preprocess_text(text, stopword_list)
        return text

    @bentoml.api(...)
    def predict(self, parsed_data):
        ...
        return pred

then I bundle it in serve.py

from src.deploy.classifier import BentoClassifier
import ...

def load_model():
    ...
    return model

def main():
    model = load_model()
    bento_svc = BentoClassifier()
    bento_svc.pack('model', model)
    bento_svc.save()

if __name__ == "__main__":
    main()

To Reproduce

Set up a directory structure as what I described, then run python serve.py to bundle the BentoService.

Expected behavior

The complete src folder should be bundled together as local dependencies, like this:

├── src
│   ├── utils
│   │   ├── preprocess.py
│   │   └── constant.py
│   ├── deploy
│   │   └── classifier.py   # wouldn't it be more intuitive if the original structure is preserved?
│   └── __init__.py
└ ...

Screenshots/Logs

Environment:

Additional context

The only work around I have is to add __init__.py to every sub-folders of dependencies that need to be bundled by BentoML, but this is quite cumbersome, especially when we have many sub-folders of local dependencies.

parano commented 3 years ago

Thanks for reporting this @Toukenize, this is because BentoML currently uses Python's own modulefinder module to find local dependencies, and it only recognizes a folder as an importable module if it contains an __init__.py file. But I think it makes sense to support bundling the import even if the folder does not contain an __init__.py file, I will look into it.

guy4261 commented 3 years ago

Hello, Think it's related. This is using bentoml==0.11.0, and bentoml serve[-gunicorn] (=not the docker wrapper but direct run).

I have my package (let's call it foo) pip-installed locally (pip install -e .). One class in this package is the service:

# foo/service.py
from bentoml import BentoService

class FooService(BentoService):

This class uses lots of imports from this package.

When I service.save_to_dir(path), under path I get

bentoml.yml
FooService/bentoml.yml

These two .yml files are identical and contain a reference

metadata
  module_name: foo.foo_service
  module_file: foo/foo_service.py

But the only part of foo copied is foo_service.py (__init__.py files are added along the path).

So I think I'm seeing what @Toukenize sees: that only the file containing the service definition is copied, and not the entire package.

This happened to me on bentoml==0.9.1 as well, and both when foo was and wasn't pip installed (I uninstalled and set my PYTHONPATH to make it work without being installed).

Two workarounds:

  1. Edit the bentoml.yml files to point module_file to the exact path to the service module;
  2. Change in saved_bundle/loader.py to not give precedence to the packed module:
    # sys.path.insert(0, bundle_path)
    # sys.path.insert(0, os.path.join(bundle_path, metadata["service_name"]))
    sys.path.append(bundle_path)
    sys.path.append(os.path.join(bundle_path, metadata["service_name"]))

and export PYTHONSTARTUP=/path/to/my/package, then the package that the service is a part of loads properly.

parano commented 3 years ago

@guy4261 two identical bentoml.yml files are expected behavior - it is necessary for making BentoService bundle "pip installable".

Did you add an __init__.py file to the directory containing your python code? If so, they should be copied to the bundle when calling save or save_to_dir. BentoML does not copy the "entire package", but only bundles the python modules that are imported and used in your BentoService class. Without the __init__.py file, python does not recognize it as a module.

guy4261 commented 3 years ago

(note: I edited my previous reply so filenames will match this comment.)

two identical bentoml.yml files are expected behavior

I assumed that's OK, just wanted to note (in case others will take a look).

Did you add an __init__.py file to the directory containing your python code?

So I have my git repo for this package, with a setup.py at the repo root. The package name is foo, so there's a directory foo. The directory structure is that of a Python package:

(git_root)
  .git/
  setup.py
  foo/
    __init__.py
    service/
        __init__.py
        foo_service.py
    pack/
        __init__.py
        pack_script.py

As I said, only foo/service/foo_service.py is copied. It has imports such as from foo import ... (there are other subpackages there). But none is copied.

That's sad because eventually I run from an environment where foo is installed; the service could've load it using the metadata.module_name form the bento.yml (if it didn't try to find the Python module using its path).

parano commented 3 years ago

Does the foo_service.py file contains your BentoService class definition? And does the foo_service.py file import from the pack_script.py? If that's the case, the pack_script.py is expected to be copied to the saved bundle and maintaining your folder structure. All imports should work when loading the saved bundle from another environment and we have tests covering this behavior here.

And BentoML does load foo_service.py based on the metadata.module_name, here's related code: https://github.com/bentoml/BentoML/blob/v0.11.0/bentoml/saved_bundle/loader.py#L204

guy4261 commented 3 years ago

Does the foo_service.py file contains your BentoService class definition? Yep:

# foo/foo_service.py
from bentoml import BentoService

class FooService(BentoService):

Does the foo_service.py file import from the pack_script.py No, vice-versa: pack_script.py imports the FooService class and packs it. Only foo_service.py is copied (along with the directory structure; i.e. it is not placed in the root of the bento but under foo/foo_service.py. But the rest of the package is not copied :(

I will look into those links - thanks 🙏

parano commented 3 years ago

Np!

pack_script.py imports the FooService class and packs it.

If that's the case, FooService itself does not rely on the pack_script.py, why should it be copied? BentoML by default only packs files and modules that are necessary to run model inference with the BentoService class, thus only copies modules that are imported by the foo_service.py file.

guy4261 commented 3 years ago

Indeed I don't need pack_script.py. But I do need foo. And out of the entire foo package, only foo/foo_service.py alone is copied :( Although it makes imports to the rest of the package.

parano commented 3 years ago

@guy4261 that sounds like an unexpected behavior, do you mean only foo/foo_service.py is copied but the foo/__init__.py file is not? Does your foo service load properly after it's being saved, if not, what's the error message? And could you share a bit more about your project structure and source code if possible?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

parano commented 2 years ago

Thanks again for all the discussion, which really inspired us to redesign the Bento packaging API in BentoML version 1.0, which makes bundling local dependencies a lot easier.

In BentoML 1.0, a project root must be specifically defined by placing a bentofile.yaml file in the directory. This file specifies all the Bento build configs, including what are the files to include in the final Bento built. The project root also should be seen as the CWD in your service's python environment, as well as part of the import path in sys.path. A more detailed explanation can be found here: https://github.com/bentoml/BentoML/tree/main/bentoml/_internal/bento