[Bug]: Exception training model: 'Cannot copy out of meta tensor; no data!'.

joneschunghk commented 8 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

Error while save interval model

Steps to reproduce the problem

I had downgraded diffusers to 0.25.0 because lora doesn't support diffusers >=0.26.0.
I had upgrade torch to 2.2.0+cu118 and xformers0.0.24+cu118 because xformers 0.0.20 is outdated.
I train a checkpoint with lora enabled.
It caused an error while save interval model

Commit and libraries

Initializing Dreambooth Dreambooth revision: 71c3465b6c866050b147c58e2caf41984df2cf45 Checking xformers... Checking bitsandbytes... Checking bitsandbytes (Windows) Virtual environment path: D:\AI\Stable Diffusion\stable-diffusion-webui\venv Checking for D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda111.dll Found windows BNB DLL D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda111.dll Checking Dreambooth requirements... Installed version of accelerate: 0.21.0 [Dreambooth] accelerate v0.21.0 is already installed. Installed version of dadaptation: 3.2 [Dreambooth] dadaptation v3.2 is already installed. Installed version of diffusers: 0.25.0 [Dreambooth] diffusers v0.25.0 is already installed. Installed version of discord-webhook: 1.3.0 [Dreambooth] discord-webhook v1.3.0 is already installed. Installed version of fastapi: 0.94.0 [Dreambooth] fastapi is already installed. Installed version of gitpython: 3.1.32 [Dreambooth] gitpython v3.1.40 is not installed. Successfully installed gitpython-3.1.41

Installed version of pytorch_optimizer: 2.12.0 [Dreambooth] pytorch_optimizer v2.12.0 is already installed. Installed version of Pillow: 9.5.0 [Dreambooth] Pillow is already installed. Installed version of tqdm: 4.66.1 [Dreambooth] tqdm is already installed. Installed version of tomesd: 0.1.3 [Dreambooth] tomesd v0.1.2 is already installed. Installed version of tensorboard: 2.13.0 [Dreambooth] tensorboard v2.13.0 is already installed. [+] torch version 2.2.0+cu118 installed. [+] torchvision version 0.17.0+cu118 installed. [+] accelerate version 0.21.0 installed. [+] diffusers version 0.25.0 installed. [+] bitsandbytes version 0.41.2.post2 installed. [+] xformers version 0.0.24+cu118 installed.

Command Line Arguments

--xformers --medvram-sdxl --no-half-vae --autolaunch

Console logs

Total images / batch: 40, total examples: 40███████████████████████████████████████████| 40/40 [00:24<00:00,  1.99it/s]
                  Initializing bucket counter!
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:00<00:00, 51.46it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:24<00:00,  3.50s/it]
Saving Lora Weights...:   0%|                                                                    | 0/1 [00:00<?, ?it/s]Model name: Turbo_v1.050%|████████████████████████████████                                | 2/4 [02:46<02:52, 86.21s/it]
Saving D:\AI\Stable Diffusion\stable-diffusion-webui\models\dreambooth\Turbo_v1.0\logging\loss_plot_18.png
Saving D:\AI\Stable Diffusion\stable-diffusion-webui\models\dreambooth\Turbo_v1.0\logging\ram_plot_18.png
Cleanup log parse.
Steps:  10%|███▋                                 | 400/4000 [31:06<3:26:03,  3.43s/it, loss=0.309, lr=0.0001, vram=6.7]Traceback (most recent call last):                                                                | 0/4 [00:00<?, ?it/s]
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 735, in start_training
    result = main(class_gen_method=class_gen_method)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1976, in main
    return inner_loop()
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1933, in inner_loop
    check_save(True)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1084, in check_save
    save_weights(
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1142, in save_weights
    vae=vae.to(accelerator.device),
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
  File "D:\AI\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
Steps:  10%|███▋                                 | 400/4000 [31:07<4:40:08,  4.67s/it, loss=0.309, lr=0.0001, vram=6.7]
Duration: 00:32:11
Saving weights/samples...:   0%|                                                                 | 0/4 [00:01<?, ?it/s]
Duration: 00:32:18

Additional information

No response

joneschunghk commented 8 months ago

After some testing: Training epochs=20, Save model epochs=10, Save preview epochs=5. The preview was generated successfully in epoch 5, and the error occured in epoch 10. Training epochs=20, Save model epochs=10, Save preview epochs=0. The preview and model were generated successfully in epoch 10 without errors.

So I guess the error occurs when saving the model and saving the preview in the same epoch. I'm testing a larger training epoch without preview now and waiting for the results.

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 14 days with no activity. Remove stale label or comment or this will be closed in 30 days

d8ahazard / sd_dreambooth_extension