Training has broken - Githubissues

Daniel-Kelvich commented 2 years ago

Training the unet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in from ..models.embeddings import ImagePositionalEmbeddings ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py) Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/AnnabBaseline3A100', '--save_starting_step=500', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/AnnabBaseline3A100/instance_images', '--output_dir=/content/models/AnnabBaseline3A100', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1. Something went wrong

fredrickflower commented 2 years ago

I am also experiencing this issue.

askiiart commented 2 years ago

Just posting here since it's a bit hard to read and to find the root error: It's ImportError, The first 15 lines.

Jongulo commented 2 years ago

Also have this issue

corbettaluigi commented 2 years ago

same

TheLastBen commented 2 years ago

I'm running it right now, no issue

TheLastBen commented 2 years ago

update to the latest colab

fredrickflower commented 2 years ago

Just ran again same issue:

Training the unet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in from ..models.embeddings import ImagePositionalEmbeddings ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py) Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/erindreambooth', '--save_starting_step=500', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/erindreambooth/instance_images', '--output_dir=/content/models/erindreambooth', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 2 years ago

what settings, is it the default model ?

Daniel-Kelvich commented 2 years ago

Still same error. Default model (1.5), new method.

rasamaya commented 2 years ago

Error here too cannot use it. was fine yesterday

Did normal hugging face token, no special model

Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in from ..models.embeddings import ImagePositionalEmbeddings ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py) Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--save_starting_step=3000', '--stop_text_encoder_training=3000', '--save_n_steps=6000', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/MJBEK', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/MJBEK/instance_images', '--output_dir=/content/models/MJBEK', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=30000']' returned non-zero exit status 1. Something went wrong

Is it the universal pull"Make notebooks universal #150"? Maybe gdrive project issue now because of that?

fredrickflower commented 2 years ago

what settings, is it the default model ?

Default model, 3000 steps, 100% text encoding, images uploaded through g drive, no checkpoint saving, female face, google colab pro so faster gpu.

TheLastBen commented 2 years ago

nope, not the pull request, it was closed, maybe the update of this morning use this old version (yesterday) and see if it works : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/dd79c4f4fd89d55ffa405a372742338404b3cdcd/fast-DreamBooth.ipynb

in the new notebook try setting contains_faces to "No"

fredrickflower commented 2 years ago

nope, not the pull request, it was closed, maybe the update of this morning use this old version (yesterday) and see if it works : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/dd79c4f4fd89d55ffa405a372742338404b3cdcd/fast-DreamBooth.ipynb

in the new notebook try setting contains_faces to "No"

Tried the old dreambooth notebook and got error:

Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in from ..models.embeddings import ImagePositionalEmbeddings ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py) Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--save_starting_step=500', '--stop_text_encoder_training=3010', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/erindreamboothmodel/instance_images', '--output_dir=/content/models/erindreamboothmodel', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1. Something went wrong

jtoy commented 2 years ago

Im getting this on training:

Traceback (most recent call last):
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in <module>
    from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel
  File "/usr/local/lib/python3.7/dist-packages/diffusers/__init__.py", line 21, in <module>
    from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/__init__.py", line 19, in <module>
    from .unet_2d import UNet2DModel
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in <module>
    from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in <module>
    from .attention import AttentionBlock, SpatialTransformer
  File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in <module>
    from ..models.embeddings import ImagePositionalEmbeddings
ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py)
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--save_starting_step=500', '--stop_text_encoder_training=6160', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jtoytest3', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jtoytest3/instance_images', '--output_dir=/content/models/jtoytest3', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=17600']' returned non-zero exit status 1.
Something went wrong

Daniel-Kelvich commented 2 years ago

@TheLastBen This is weird. I've checked the original diffusers and your fork and it seems that colab code does not resemble either of them.

jtoy commented 2 years ago

Also of note, I trained a model this AM and itworked, then an hour later, I tried 5 more models in a row and they all died.

TheLastBen commented 2 years ago

You didn't update the diffusers ? which GPU do you have now ?

fredrickflower commented 2 years ago

You didn't update the diffusers ? which GPU do you have now ?

Ahhhh, I've only just got Colab Pro so I'm using a better GPU, how can I change the diffusers and how do I check which GPU I'm using?

TheLastBen commented 2 years ago

run !nvidia-smi in a new cell

fredrickflower commented 2 years ago

run !nvidia-smi in a new cell

I'm running an A100

TheLastBen commented 2 years ago

OK, I'll try to fix that

fredrickflower commented 2 years ago

OK, I'll try to fix that

Cheers man

Daniel-Kelvich commented 2 years ago

I tried A100 and T4 same error.

TheLastBen commented 2 years ago

@fredrickflower try now with the new fix : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

TheLastBen commented 2 years ago

I tried A100 and T4 same error.

I'm running it on a T4 at the moment, I can't reproduce the error

sphuff commented 2 years ago

@TheLastBen I'm still seeing the same error on https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb (also running an A100)

fredrickflower commented 2 years ago

@fredrickflower try now with the new fix : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

Same error unfortunately:

Training the unet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in from ..models.embeddings import ImagePositionalEmbeddings ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py) Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/errortest', '--save_starting_step=2000', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/errortest/instance_images', '--output_dir=/content/models/errortest', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=3000']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 2 years ago

I think it's fixed now, disconnect from the colab and reconnect

Daniel-Kelvich commented 2 years ago

Error with T4.

UnboundLocalError: local variable 'save_dir' referenced before assignment Progress:|████ | 15% 300/2000 [05:01<28:29, 1.01s/it, loss=0.0318, lr=1.72e-6] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--save_starting_step=500', '--stop_text_encoder_training=300', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Test', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/Test/instance_images', '--output_dir=/content/models/Test', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 2 years ago

that's not related to the GPU, that's a different error EDIT : fixed, a mistake I made with earlier update

Daniel-Kelvich commented 2 years ago

A100, another import error but similar. ImportError: cannot import name 'SpatialTransformer' from 'diffusers.models.attention'

rasamaya commented 2 years ago

main colab, refreshed, loaded up and TRAINING! Cheers, works for me now

Daniel-Kelvich commented 2 years ago

@rasamaya are you using a100?

sphuff commented 2 years ago

Just ran again with an a100 and still seeing the SpatialTransformer error

jtoy commented 2 years ago

Im pretty sure the code doesnt work on a100, I can get it to run on the regular GPU instance though.

rasamaya commented 2 years ago

I'm on regular gpu mode, don't do professional this round. not sure which gpu they gave me and I don't think I can see as it's training

fredrickflower commented 2 years ago

Any updates?

dmityul commented 2 years ago

Training the unet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 782, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 475, in main text_encoder = CLIPTextModel.from_pretrained(args.output_dir, subfolder="text_encoder_trained") File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py", line 1977, in from_pretrained kwargs, File "/usr/local/lib/python3.7/dist-packages/transformers/models/clip/configuration_clip.py", line 133, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py", line 558, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py", line 625, in _get_config_dict _commit_hash=commit_hash, File "/usr/local/lib/python3.7/dist-packages/transformers/utils/hub.py", line 381, in cached_file f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout " OSError: /content/models/new1 does not appear to have a file named text_encoder_trained/config.json. Checkout 'https://huggingface.co//content/models/new1/None' for available files. Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/new1', '--save_starting_step=500', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/new1/instance_images', '--output_dir=/content/models/new1', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=9445']' returned non-zero exit status 1. Something went wrong

sphuff commented 2 years ago

@TheLastBen I think this might be related to this change in the diffusers library where SpatialTransformer was removed (check out src/diffusers/models/unet_2d_blocks.py). I tried modifying the notebook to point to the main fork of the diffusers library, which solved that error but created some others

nathanielherman commented 2 years ago

I fixed the "SpatialTransformers" error by updating the wget for A100 from "https://raw.githubusercontent.com/huggingface/diffusers/main/src/diffusers/models/attention.py" to "https://raw.githubusercontent.com/huggingface/diffusers/269109dbfbbdbe2800535239b881e96e1828a0ef/src/diffusers/models/attention.py". I updated this in all the cells (ie both the initial setup cell and the training cell)

I think https://github.com/huggingface/diffusers/commit/ef2ea33c3bc061fffa8bc4ccd640306ca1a1847d this change renamed SpatialTransformer, so I just pointed it at an older commit before that

sphuff commented 2 years ago

Can confirm @nathanielherman 's fix works 👍

TheLastBen commented 2 years ago

https://github.com/TheLastBen/fast-stable-diffusion/commit/33f64fcabcb17b767f57329a09c5cf31760fa855

corbettaluigi commented 2 years ago

A100 -- 33f64fc Error: No module named 'xformers'

Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 10, in import xformers ModuleNotFoundError: No module named 'xformers' Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--save_starting_step=500', '--stop_text_encoder_training=2200', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ARCIMOBOLDI', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ARCIMOBOLDI/instance_images', '--output_dir=/content/models/ARCIMOBOLDI', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=2200']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 2 years ago

try this : %pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl

I'm still unsure if xformers works with A100 or not, try the above, if it works, I'll add back xformers for A100

jamais commented 2 years ago

I'm trying in Colab Pro and getting xformers error too

Training the unet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in from .unet_2d import UNet2DModel File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in from .attention import AttentionBlock, SpatialTransformer File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 10, in import xformers ModuleNotFoundError: No module named 'xformers' Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/eakbudak', '--save_starting_step=500', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/eakbudak/instance_images', '--output_dir=/content/models/eakbudak', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=8900']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 2 years ago

create a new cell and try this try this :

%pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl

corbettaluigi commented 2 years ago

done, but at beginning of training gives: RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

2022-11-04 08:21:51.006977: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. Progress:| | 0% 6/2200 [00:14<52:57, 1.45s/it, loss=0.371, lr=1.99e-6] lcsarcimboldo Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 782, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 669, in main accelerator.backward(loss) File "/usr/local/lib/python3.7/dist-packages/accelerate/accelerator.py", line 884, in backward loss.backward(**kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 175, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR Progress:| | 0% 6/2200 [00:15<1:33:20, 2.55s/it, loss=0.371, lr=1.99e-6] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--save_starting_step=500', '--stop_text_encoder_training=2200', '--save_n_steps=500', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ARCIMBOLDI', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/ARCIMBOLDI/instance_images', '--output_dir=/content/models/ARCIMBOLDI', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=2200']' returned non-zero exit status 1. Something went wrong

Daniel-Kelvich commented 2 years ago

create a new cell and try this try this :

%pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl

Thanks, this works. But seems to work much slower now ( ~3 times slower).

TheLastBen commented 2 years ago

@corbettaluigi You're probably using an old notebook, use the latest one

TheLastBen commented 2 years ago

create a new cell and try this try this : %pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl

Thanks, this works. But seems to work much slower now ( ~3 times slower).

So you're saying that xformers worked before ? what date exactly so I can bring back the changes

TheLastBen / fast-stable-diffusion

Training has broken #364