d8ahazard / sd_dreambooth_extension

Other
1.86k stars 281 forks source link

[Bug]: ValueError: std evaluated to zero after conversion to torch.float32 issue #1304

Closed Pychnight closed 1 year ago

Pychnight commented 1 year ago

Is there an existing issue for this?

What happened?

started training and this error pops up, I have tried to reinstall but same results.

Steps to reproduce the problem

  1. batch size 10.
  2. Start training... it comes up with this error, even when reducing the batch size back to 1 it still produces the same error.

Commit and libraries

Dreambooth revision: c2a5617c587b812b5a408143ddfb18fc49234edf Successfully installed accelerate-0.19.0 bitsandbytes-0.35.4 dadaptation-3.1 diffusers-0.16.1 discord-webhook-1.1.0 fastapi-0.94.1 gitpython-3.1.31 lion-pytorch-0.1.2 tensorboard-2.13.0 transformers-4.30.2

[+] xformers version 0.0.20 installed. [+] torch version 2.0.1+cu118 installed. [+] torchvision version 0.15.2+cu118 installed. [+] accelerate version 0.19.0 installed. [+] diffusers version 0.16.1 installed. [+] transformers version 4.30.2 installed. [+] bitsandbytes version 0.35.4 installed.

Command Line Arguments

--xformers

Console logs

Initializing bucket counter!
Steps:   0%| | 540/1384240 [01:29<41:06:56,  9.35it/s, inst_loss=0.258, loss=0.332, lr=1e-6, prior_loss=0.098, vram=18.Traceback (most recent call last):
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 729, in start_training
    result = main(class_gen_method=class_gen_method)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1548, in main
    return inner_loop()
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 119, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1212, in inner_loop
    for step, batch in enumerate(train_dataloader):
  File "D:\stable-diffusion-webui\venv\lib\site-packages\accelerate\data_loader.py", line 388, in __iter__
    next_batch = next(dataloader_iter)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\dataloader.py", line 633, in __next__
    data = self._next_data()
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\dataloader.py", line 677, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\dataset\db_dataset.py", line 345, in __getitem__
    image_data, input_ids = self.load_image(image_path, caption, self.active_resolution)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\dataset\db_dataset.py", line 123, in load_image
    image = self.image_transform(img)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\dataset\db_dataset.py", line 112, in image_transform
    return norm(img)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torchvision\transforms\transforms.py", line 277, in forward
    return F.normalize(tensor, self.mean, self.std, self.inplace)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torchvision\transforms\functional.py", line 363, in normalize
    return F_t.normalize(tensor, mean=mean, std=std, inplace=inplace)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torchvision\transforms\_functional_tensor.py", line 923, in normalize
    raise ValueError(f"std evaluated to zero after conversion to {dtype}, leading to division by zero.")
ValueError: std evaluated to zero after conversion to torch.float32, leading to division by zero.
Steps:   0%| | 540/1384240 [01:29<63:52:07,  6.02it/s, inst_loss=0.258, loss=0.332, lr=1e-6, prior_loss=0.098, vram=18.

Additional information

No response

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days