Training fails - Githubissues

Eriico01 commented 1 year ago

/usr/local/lib/python3.9/dist-packages/flax/core/frozen_dict.py:169: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use register_pytree_with_keys() instead. jax.tree_util.register_keypaths( '########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######::: ... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##:: ::: ##:::: ##:::: ##::'##:. ##::: ##:: ####: ##:: ##:: ####: ##: ##:::..::: ::: ##:::: ########::'##:::. ##:: ##:: ## ## ##:: ##:: ## ## ##: ##::'####: ::: ##:::: ##.. ##::: #########:: ##:: ##. ####:: ##:: ##. ####: ##::: ##:: ::: ##:::: ##::. ##:: ##.... ##:: ##:: ##:. ###:: ##:: ##:. ###: ##::: ##:: ::: ##:::: ##:::. ##: ##:::: ##:'####: ##::. ##:'####: ##::. ##:. ######::: :::..:::::..:::::..::..:::::..::....::..::::..::....::..::::..:::......::::

0% 0/1500 [00:00<?, ?it/s] Musti Musti Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 668, in main latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample() File "/usr/local/lib/python3.9/dist-packages/diffusers/models/autoencoder_kl.py", line 158, in encode h = self.encoder(x) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/vae.py", line 105, in forward sample = down_block(sample) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py", line 984, in forward hidden_states = resnet(hidden_states, temb=None) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.9/dist-packages/diffusers/models/resnet.py", line 540, in forward hidden_states = self.norm1(hidden_states) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/normalization.py", line 273, in forward return F.group_norm( File "/usr/local/lib/python3.9/dist-packages/torch/nn/functional.py", line 2528, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.83 GiB (GPU 0; 14.75 GiB total capacity; 9.21 GiB already allocated; 4.47 GiB free; 9.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 0% 0/1500 [00:04<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Musti', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-768', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Musti/instance_images', '--output_dir=/content/models/Musti', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Musti/captions', '--instance_prompt=', '--seed=2339', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=1500']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 1 year ago

are you using the latest notebook ?

Eriico01 commented 1 year ago

are you using the latest notebook ?

yes i am

Eriico01 commented 1 year ago

tested with different google accounts but always same issue

TheLastBen commented 1 year ago

you're using the v-768 with 512 resolution ?

Eriico01 commented 1 year ago

you're using the v-768 with 512 resolution ?

v-768

TheLastBen commented 1 year ago

you're not using the latest notebook, use this https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

Eriico01 commented 1 year ago

you're not using the latest notebook, use this https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

okay i will try <3

Eriico01 commented 1 year ago

you're not using the latest notebook, use this https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

Trainng went good now but when im gonna test it this comes up

Traceback (most recent call last): File "/content/gdrive/MyDrive/sd/stable-diffusion-webui/webui.py", line 25, in import gradio File "/usr/local/lib/python3.9/dist-packages/gradio/init.py", line 3, in import gradio.components as components File "/usr/local/lib/python3.9/dist-packages/gradio/components.py", line 35, in from gradio.blocks import Block, BlockContext File "/usr/local/lib/python3.9/dist-packages/gradio/blocks.py", line 27, in from gradio.helpers import EventData, create_tracker, skip, special_args ImportError: cannot import name 'EventData' from 'gradio.helpers' (/usr/local/lib/python3.9/dist-packages/gradio/helpers.py)

The-Great-Nothing commented 1 year ago

I'm getting the same CUDA error training using the Dreamshaper model from Hugginface (1.5, 512). It was all well last week. PS: I checked and I do use the latest notebook.

UPDATE: There was one pic in the dataset which was not resized and broke the training since it was 4Mpx. Apologies.

TheLastBen / fast-stable-diffusion

Training fails #1836