bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.16k stars 792 forks source link

bug: BentoML unexpected behavior when interacting with Editable Pip Modules #4217

Open tokotchd opened 1 year ago

tokotchd commented 1 year ago

Describe the bug

If you happen to be using bentoml as a build server involving modular code that you're installing via pip install -e <path_to_folder>, you're going to encounter very strange/bad behavior:

  1. Incorrect models, services, and runners being loaded
  2. Inconsistent behavior between bentoml build and bentoml build --containerize
  3. Possible dependency import issues etc.

To reproduce

Simplest possible reproduction of the bug:

Start with the following folder structure (describable as a bento project overtop of an installable python module):

- module_1
-- module1
--- __init__.py
-- bento_packer.py
-- bento_service.py
-- bentofile.yaml
-- setup.py
- module_2
-- module2
--- __init__.py
-- bento_packer.py
-- bento_service.py
-- bentofile.yaml
-- setup.py

Then, install both python modules: pip install -e module_1 pip install -e module_2 Finally, attempt to bentoml build in module_2

Error: [bentoml-cli] build failed: no Models with name 'module_1' exist in BentoML store <osfs '/home/user/bentoml/models'>

If we look at the --verbose flag, we're seeing that while the build process starts with the correct bentofile, it manages to lose it during the builds/imports process and ends up using another's instead due to the symlinks created by pip install -e

Importing service "bento_service:svc" from working dir: "/home/user/bentoml_bug/module_2"
...
 File "/home/user/bentoml_bug/module_1/bento_service.py", line 3, in <module>
    runner = bentoml.picklable_model.get("module_1:latest").to_runner()
...
bentoml.exceptions.NotFound: no Models with name 'module_1' exist in BentoML store <osfs '/home/user/bentoml/models'>

bentoml_bug.zip

Expected behavior

The ideal behavior would be that coincidentally having a bentofile.yaml anywhere on the system that happens to be in a python module installed with pip install -e will not break the bentoml build CLI.

I'm guessing the behavior I'm seeing is more due to path/symlink changes caused by pip than bentoML itself. If the fix is outside of the scope of bentoML than I'm happy to at least have documented the issue here with the ask of adding something to BentoML documentation that the bentoML cli does not play well with editable python modules.

Environment

bentoml==1.1.6 Python 3.9.15 Ubuntu 20.04

tokotchd commented 1 year ago

This indeed seems to be the way that Bentoml searches for the specified bento_service.py file as defined in the bentofile.yaml

A dirty workaround is to ensure that all of your systemwide modules have unique bento_service.py filenames so when they are added to PATH by pip install -e, BentoML will not mistakenly pick up an identically named bento_service.py files from different directories.

tokotchd commented 1 year ago

The issue comes from BentoML/src/bentoml/_internal/service/loader.py

When you pip install -e any module on the system, it is added to the path:

print(sys.path)
'... /home/user/bentoml_bug/module_1', '/home/user/bentoml_bug/module_2'

Thus, lines 84-86 does not trigger

if working_dir not in sys.path:
    sys.path.insert(0, working_dir)
    sys_path_modified = True

And when it comes time to import at line 137, the first instance of bento_service.py on the PATH is loaded by

module = importlib.import_module(module_name, package=working_dir)

Regardless of what's passed in by working dir

This issue is solvable by always prepending the current working dir at the start of the path and always removing the first instance of it at the end.

This is a bit of a brute force solution and there's probably a better way of doing it with less ramifications (that I'm unaware of), but this solves my use case.

parano commented 1 year ago

Hi @tokotchd -

"This issue is solvable by always prepending the current working dir at the start of the path and always removing the first instance of it at the end. " - do you need to modify BentoML code in loader.py for this to work?

I think always prepending the working_dir to sys.path during loading makes sense, would you be open to submit a PR for it?

tokotchd commented 1 year ago

@parano I threw up a quick PR.