Lightning-Universe / stable-diffusion-deploy

Learn to serve Stable Diffusion models on cloud infrastructure at scale. This Lightning App shows load-balancing, orchestrating, pre-provisioning, dynamic batching, GPU-inference, micro-services working together via the Lightning Apps framework.
https://lightning.ai/muse
Apache License 2.0
392 stars 39 forks source link

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory #263

Open slavakurilyak opened 1 year ago

slavakurilyak commented 1 year ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. conda create --name muse_app python=3.9 --yes
  2. conda activate muse_app
  3. git clone https://github.com/Lightning-AI/stable-diffusion-deploy.git
  4. cd stable-diffusion-deploy
  5. bash dev_install.sh
  6. python -m lightning run app app.py
  7. See error
Your Lightning App is starting. This won't take long.
INFO: Your app has started. View it in your browser: http://127.0.0.1:7501/view
INFO: Received SIGTERM signal. Gracefully terminating safety_checker_embedding_work...
loading model...
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
  warning_cache.warn(
Loading model from v1-5-pruned-emaonly.ckpt
Process SpawnProcess-7:
Traceback (most recent call last):
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/app/utilities/proxies.py", line 437, in __call__
    raise e
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/app/utilities/proxies.py", line 418, in __call__
    self.run_once()
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/app/utilities/proxies.py", line 569, in run_once
    self.work.on_exception(e)
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/app/core/work.py", line 625, in on_exception
    raise exception
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/app/utilities/proxies.py", line 534, in run_once
    ret = self.run_executor_cls(self.work, work_run, self.delta_queue)(*args, **kwargs)
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/lightning/app/utilities/proxies.py", line 367, in __call__
    return self.work_run(*args, **kwargs)
  File "/Users/skurilyak/dev/testing/stable-diffusion-deploy/muse/components/stable_diffusion_serve.py", line 129, in run
    self.build_pipeline()
  File "/Users/skurilyak/dev/testing/stable-diffusion-deploy/muse/components/stable_diffusion_serve.py", line 83, in build_pipeline
    self._model = create_text2image(sd_variant=os.environ.get("SD_VARIANT", "sd1"))
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/stable_diffusion_inference/factory.py", line 33, in create_text2image
    model = SDInference(
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/stable_diffusion_inference/model.py", line 89, in __init__
    self.model = StableDiffusionModule(
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/stable_diffusion_inference/lit_model.py", line 75, in __init__
    self.model = load_model_from_config(
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/stable_diffusion_inference/lit_model.py", line 39, in load_model_from_config
    pl_sd = torch.load(ckpt, map_location="cpu")
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/torch/serialization.py", line 777, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "/Users/skurilyak/miniconda3/envs/muse_app/lib/python3.9/site-packages/torch/serialization.py", line 282, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
INFO: Your Lightning App is being stopped. This won't take long.
INFO: Your Lightning App has been stopped successfully!
INFO: Received SIGTERM signal. Gracefully terminating load_balancer...
INFO: Received SIGTERM signal. Gracefully terminating slack_bot...

Code sample

Expected behavior

I'm expecting the Lightning app to run on local host.

Environment

Additional context

aniketmaurya commented 1 year ago

Hi @slavakurilyak, this issue is often observed when the model checkpoint is corrupted, such as when there is a failure in the download process. You can find more information about this issue in this thread on the PyTorch forum.