bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.26k stars 524 forks source link

Pyinstaller packaged Petals binary fails to load/download on`AutoDistributedModelForCausalLM.from_pretrained` #528

Open biswaroop1547 opened 1 year ago

biswaroop1547 commented 1 year ago

I using Pyinstaller to package a script that uses Petals, here's what my components look like:

main.py:

import torch, os

from petals import AutoDistributedModelForCausalLM
import multiprocessing
import multihash
import multiaddr
import multiaddr.codecs.fspath
multiprocessing.freeze_support()

model="petals-team/StableBeluga2"
print("GOING TO START NOW")

def download_model() -> None:

    print(os)
    _ = AutoDistributedModelForCausalLM.from_pretrained(model, **kwargs)

download_model()

requirements.txt:

fastapi==0.95.0
uvicorn==0.21.1
pytest==7.2.2
requests==2.28.2
tqdm==4.65.0
httpx==0.23.3
python-dotenv==1.0.0
tenacity==8.2.2
petals==2.2.0

For installing deps & env:

#!/bin/bash

PWD_PATH="$(pwd)"
REQUIREMENTS_FILE_PATH="${PWD_PATH}/petals/requirements.txt"

# Create and activate a Python virtual environment
virtualenv venv -p=3.11
source ./venv/bin/activate

# Install the necessary packages
pip install -r $REQUIREMENTS_FILE_PATH pyinstaller

Note: There seems to be an issue when using torchscript with pyinstaller and hivemind uses it for a gelu function so above the function definition I've put torch.jit.script = torch.jit.script_if_tracing which makes it work with pyinstaller

And for packaging using pyinstaller I have this script (package_petals.sh):

#!/bin/bash

# Set the paths to the Python script, requirements file, and bash script
PWD_PATH="$(pwd)"
PYTHON_SCRIPT_PATH="${PWD_PATH}/petals/main.py"
BUILD_SIDE_SCRIPT_PATH="${PWD_PATH}/petals/package_petals.sh"
DIST_PATH="${PWD_PATH}/bin/python"

pyinstaller --onefile --distpath $DIST_PATH --clean --hidden-import=torch --collect-data torch --hidden-import=transformers --collect-data=transformers --copy-metadata=transformers --copy-metadata torch --copy-metadata tqdm --copy-metadata regex --copy-metadata requests --copy-metadata packaging --copy-metadata filelock --copy-metadata numpy --copy-metadata tokenizers --copy-metadata importlib_metadata --copy-metadata huggingface-hub --copy-metadata safetensors --copy-metadata pyyaml --copy-metadata petals --copy-metadata hivemind --hidden-import multiprocessing.BufferTooShort --hidden-import multiprocessing.AuthenticationError --hidden-import multiprocessing.get_context --hidden-import multiprocessing.TimeoutError --hidden-import multiprocessing.set_start_method --hidden-import multiprocessing.get_start_method --hidden-import multiprocessing.Queue --hidden-import multiprocessing.Process --hidden-import multiprocessing.Pipe --hidden-import multiprocessing.cpu_count --hidden-import multiprocessing.RLock  --hidden-import multiprocessing.Pool --hidden-import torch.multiprocessing._prctl_pr_set_pdeathsig --hidden-import torch.distributed._tensor._collective_utils  --hidden-import hivemind --hidden-import hivemind.dht.schema --hidden-import multiprocessing --hidden-import scipy.linalg._basic --hidden-import filelock._windows --hidden-import scipy.sparse._dok --hidden-import scipy.linalg._special_matrices --name=cht-petals-aarch64-apple-darwin --paths ./venv/lib/python3.11/site-packages --paths "$(PWD_PATH)/petals" $PYTHON_SCRIPT_PATH

now when I run ./bin/python/cht-petals-aarch64-apple-darwin it throws FileNotFoundError (also doesn't download shards from hf_hub unless explicitly mentioned force_download=True but even after downloading it doesn't seem to load up and fails with same error):

Oct 18 23:18:54.719 [INFO] Make sure you follow the LLaMA's terms of use: https://bit.ly/llama2-license for LLaMA 2, https://bit.ly/llama-license for LLaMA 1
Oct 18 23:18:54.719 [INFO] Using DHT prefix: StableBeluga2-hf
Traceback (most recent call last):
  File "download.py", line 39, in <module>
  File "download.py", line 34, in download_model
  File "petals/utils/auto_config.py", line 78, in from_pretrained
  File "petals/utils/auto_config.py", line 51, in from_pretrained
  File "petals/client/from_pretrained.py", line 37, in from_pretrained
  File "transformers/modeling_utils.py", line 3085, in from_pretrained
  File "petals/models/llama/model.py", line 132, in __init__
  File "petals/models/llama/model.py", line 34, in __init__
  File "petals/client/remote_sequential.py", line 47, in __init__
  File "petals/client/routing/sequence_manager.py", line 89, in __init__
  File "hivemind/dht/dht.py", line 88, in __init__
  File "hivemind/dht/dht.py", line 148, in run_in_background
  File "hivemind/dht/dht.py", line 151, in wait_until_ready
  File "hivemind/utils/mpfuture.py", line 262, in result
  File "concurrent/futures/_base.py", line 445, in result
  File "concurrent/futures/_base.py", line 390, in __get_result
FileNotFoundError: [Errno 2] No such file or directory

I also came through #468 which has the exact same stacktrace as mine (though I haven't validated #468 as in my usecase I specifically need a binary file, so irrelevant - hence used pyinstaller)

Other things I tried

when I do something like:

try:
        _ = AutoDistributedModelForCausalLM.from_pretrained(args.model, **kwargs)
except FileNotFoundError as e:
        os.system("python")

so that it pops up a new python shell but now inside it and using local python from binary, and if I run the code after importing all modules and with .from_pretrained manually inside the shell -> it works but only inside this new shell which binary file initiates.

Also discarding petals totally and just using its transformers equivalent modules didn't cause any issue and the models downloads and loads up correctly when followed the above steps.

My hunch is something specific happening because of how there are few dynamic (not static) elements when things startup in - petals x hivemind i.e in these files

File "petals/client/remote_sequential.py", line 47, in __init__
  File "petals/client/routing/sequence_manager.py", line 89, in __init__
  File "hivemind/dht/dht.py", line 88, in __init__
  File "hivemind/dht/dht.py", line 148, in run_in_background
  File "hivemind/dht/dht.py", line 151, in wait_until_ready
  File "hivemind/utils/mpfuture.py", line 262, in result

My machine specifications:

Apple M2 16GB macos - 13.5.1