bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.16k stars 792 forks source link

bug: "bentoml build" command takes too long time but there is no additional message with "--verbose" tag. #3235

Closed seyong92 closed 4 months ago

seyong92 commented 2 years ago

Describe the bug

In "bentofile.yaml", we only include "*.py" file and some text files, but it takes a too long time to build bento.

It takes more than 1 hour for building bentos, and the created bento is just around 50 MB.

When I use "--verbose" tag when using "bento build", it does not return anything, so I cannot know the reason.

To reproduce

I just share the "bentofile.yaml" file of my project.

service: "bentoml_service:svc"
labels:
  owner: our-team
  stage: dev
include:
  - "*.py"
  - "*.csv"
exclude:
  - "log_save/"
  - "model_save/"
python:
  requirements_txt: "./requirements.txt"
docker:
  distro: debian
  python_version: "3.9.13"
  cuda_version: "11.6.2"
  system_packages:
    - libsndfile-dev
    - ffmpeg

Expected behavior

No response

Environment

bentoml: 1.0.10 python: 3.9.13

knoll-fabio commented 1 year ago

I have the same problem, bentoml build just takes too long and it is hard to debug this. My build ctx is the project root and it contains many directories with a lot of files. I did exclude these in bentofile.yml but maybe this is the problem.

aarnphm commented 1 year ago

Hi there, is there a virtualenv folder under this build directory?

knoll-fabio commented 1 year ago

Hi, yes there is. I looked at the code and noticed that for each file to be included, the entire directory structure from the build context is traversed to look for and parse .bentoignore files, which is very time consuming if the build directory contains a lot of files and directories. Wouldn't it make sense here to only step through the included directories?

aarnphm commented 1 year ago

if you add the venv folder to the ignore file it will fix the issue.

knoll-fabio commented 1 year ago

Unfortunately, it didn't.

If I understood correctly, it iterates unfiltered through the entire build directory and compares each path first with the included and excluded paths from the bentofile.yaml and then with the excluded paths from the .bentoignore file and copies them to the target directory accordingly or not.

https://github.com/bentoml/BentoML/blob/23ba7cfed5ac06f46243119222266f1101f210b4/src/bentoml/_internal/bento/bento.py#L225-L236

Here, however, the entire build directory is iterated through again at each iteration to search for .bentoignore files and parse the contents unless the path has already been excluded in bentofile.yaml.

https://github.com/bentoml/BentoML/blob/23ba7cfed5ac06f46243119222266f1101f210b4/src/bentoml/_internal/bento/build_config.py#L867-L874

https://github.com/bentoml/BentoML/blob/23ba7cfed5ac06f46243119222266f1101f210b4/src/bentoml/_internal/bento/build_config.py#L846-L865

Wouldn't it make sense here to use the filter_dirs and exclude_dirs parameters of the fs.walk() function to pre-filter for the paths included and excluded in bentofile.yaml and do the same in the specs.from_path() method?

frostming commented 4 months ago

This has been improved in the latest versions of bentoml