bug: `openllm download` dies with Signal.SIGKILL: 9

EmilStenstrom commented 1 year ago

Describe the bug

I'm running through the most basic install. I have creates an empty virtualenv with python 3.11. I've run pip install openllm, and I get a crash when I run openllm start dolly-v2.

The error I get is: subprocess.CalledProcessError: Command '['/home/emilstenstrom/.pyenv/versions/3.11.3/envs/openllm/bin/python3.11', '-m', 'openllm', 'download', 'dolly-v2', '--model-id', 'databricks/dolly-v2-3b', '--output', 'porcelain']' died with <Signals.SIGKILL: 9>.

To reproduce

Just run the full install from scratch.

Logs

Here's a full stacktrace of the run:

(openllm) 2023-06-19 15:59:29 ~/Projects/openllm $ openllm start dolly-v2
Traceback (most recent call last):
  File "/home/username/.pyenv/versions/openllm/bin/openllm", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/openllm/cli.py", line 324, in wrapper
    return func(*args, **attrs)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/openllm/cli.py", line 297, in wrapper
    return_value = func(*args, **attrs)
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/openllm/cli.py", line 272, in wrapper
    return f(*args, **attrs)
           ^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/openllm/cli.py", line 671, in model_start
    llm = t.cast(
          ^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/openllm/models/auto/factory.py", line 120, in for_model
    llm.ensure_model_id_exists()
  File "/home/username/.pyenv/versions/3.11.3/envs/openllm/lib/python3.11/site-packages/openllm/_llm.py", line 666, in ensure_model_id_exists
    output = subprocess.check_output(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/lib/python3.11/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/.pyenv/versions/3.11.3/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/username/.pyenv/versions/3.11.3/envs/openllm/bin/python3.11', '-m', 'openllm', 'download', 'dolly-v2', '--model-id', 'databricks/dolly-v2-3b', '--output', 'porcelain']' died with <Signals.SIGKILL: 9>.


### Environment

python: #### Environment variable

```bash
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.0.22 python: 3.11.3 platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 uid_gid: 1000:1000

pip_packages

``` accelerate==0.20.3 aiohttp==3.8.4 aiosignal==1.3.1 anyio==3.7.0 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.2 attrs==23.1.0 bentoml==1.0.22 build==0.10.0 cattrs==23.1.2 certifi==2023.5.7 charset-normalizer==3.1.0 circus==0.18.0 click==8.1.3 click-option-group==0.5.6 cloudpickle==2.2.1 cmake==3.26.4 coloredlogs==15.0.1 contextlib2==21.6.0 datasets==2.13.0 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.6 filelock==3.12.2 filetype==1.2.0 frozenlist==1.3.3 fs==2.4.16 fsspec==2023.6.0 grpcio==1.54.2 grpcio-health-checking==1.48.2 h11==0.14.0 httpcore==0.17.2 httpx==0.24.1 huggingface-hub==0.15.1 humanfriendly==10.0 idna==3.4 importlib-metadata==6.0.1 inflection==0.5.1 Jinja2==3.1.2 lit==16.0.6 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.14 networkx==3.1 numpy==1.25.0 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 openllm==0.1.6 opentelemetry-api==1.17.0 opentelemetry-instrumentation==0.38b0 opentelemetry-instrumentation-aiohttp-client==0.38b0 opentelemetry-instrumentation-asgi==0.38b0 opentelemetry-instrumentation-grpc==0.38b0 opentelemetry-sdk==1.17.0 opentelemetry-semantic-conventions==0.38b0 opentelemetry-util-http==0.38b0 optimum==1.8.8 orjson==3.9.1 packaging==23.1 pandas==2.0.2 pathspec==0.11.1 Pillow==9.5.0 pip-requirements-parser==32.0.1 pip-tools==6.13.0 prometheus-client==0.17.0 protobuf==3.20.3 psutil==5.9.5 pyarrow==12.0.1 pydantic==1.10.9 Pygments==2.15.1 pynvml==11.5.0 pyparsing==3.1.0 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3 PyYAML==6.0 pyzmq==25.1.0 regex==2023.6.3 requests==2.31.0 rich==13.4.2 safetensors==0.3.1 schema==0.7.5 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.28.0 sympy==1.12 tabulate==0.9.0 tokenizers==0.13.3 torch==2.0.1 torchvision==0.15.2 tornado==6.3.2 tqdm==4.65.0 transformers==4.30.2 triton==2.0.0 typing_extensions==4.6.3 tzdata==2023.3 urllib3==2.0.3 uvicorn==0.22.0 watchfiles==0.19.0 wcwidth==0.2.6 wrapt==1.15.0 xxhash==3.2.0 yarl==1.9.2 zipp==3.15.0 ```

transformers version: 4.30.2
Platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python version: 3.11.3
Huggingface_hub version: 0.15.1
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.0.1+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: <not relevant, code crashes after download?>
Using distributed or parallel set-up in script?: <not relevant, code crashes after download?>

aarnphm commented 1 year ago

Hey there, how much memory do you have available?

Often time this error is raised when there is OOM issue.

aarnphm commented 1 year ago

Hey, can you try out with 0.1.14?

aarnphm commented 1 year ago

Please reopen if you still run into this issue

EmilStenstrom commented 1 year ago

@aarnphm thanks, will try when I get home from vacation!

bentoml / OpenLLM