Open Daniel-Kelvich opened 2 years ago
I am also experiencing this issue.
Just posting here since it's a bit hard to read and to find the root error: It's ImportError
, The first 15 lines.
Also have this issue
same
I'm running it right now, no issue
update to the latest colab
Just ran again same issue:
Training the unet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
what settings, is it the default model ?
Still same error. Default model (1.5), new method.
Error here too cannot use it. was fine yesterday
Did normal hugging face token, no special model
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
Is it the universal pull"Make notebooks universal #150"? Maybe gdrive project issue now because of that?
what settings, is it the default model ?
Default model, 3000 steps, 100% text encoding, images uploaded through g drive, no checkpoint saving, female face, google colab pro so faster gpu.
nope, not the pull request, it was closed, maybe the update of this morning use this old version (yesterday) and see if it works : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/dd79c4f4fd89d55ffa405a372742338404b3cdcd/fast-DreamBooth.ipynb
in the new notebook try setting contains_faces to "No"
nope, not the pull request, it was closed, maybe the update of this morning use this old version (yesterday) and see if it works : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/dd79c4f4fd89d55ffa405a372742338404b3cdcd/fast-DreamBooth.ipynb
in the new notebook try setting contains_faces to "No"
Tried the old dreambooth notebook and got error:
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
Im getting this on training:
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in <module>
from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel
File "/usr/local/lib/python3.7/dist-packages/diffusers/__init__.py", line 21, in <module>
from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/__init__.py", line 19, in <module>
from .unet_2d import UNet2DModel
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in <module>
from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in <module>
from .attention import AttentionBlock, SpatialTransformer
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in <module>
from ..models.embeddings import ImagePositionalEmbeddings
ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py)
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--save_starting_step=500', '--stop_text_encoder_training=6160', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jtoytest3', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jtoytest3/instance_images', '--output_dir=/content/models/jtoytest3', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=17600']' returned non-zero exit status 1.
Something went wrong
@TheLastBen This is weird. I've checked the original diffusers and your fork and it seems that colab code does not resemble either of them.
Also of note, I trained a model this AM and itworked, then an hour later, I tried 5 more models in a row and they all died.
You didn't update the diffusers ? which GPU do you have now ?
You didn't update the diffusers ? which GPU do you have now ?
Ahhhh, I've only just got Colab Pro so I'm using a better GPU, how can I change the diffusers and how do I check which GPU I'm using?
run !nvidia-smi
in a new cell
run
!nvidia-smi
in a new cell
I'm running an A100
OK, I'll try to fix that
OK, I'll try to fix that
Cheers man
I tried A100 and T4 same error.
@fredrickflower try now with the new fix : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb
I tried A100 and T4 same error.
I'm running it on a T4 at the moment, I can't reproduce the error
@TheLastBen I'm still seeing the same error on https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb (also running an A100)
@fredrickflower try now with the new fix : https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb
Same error unfortunately:
Training the unet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
I think it's fixed now, disconnect from the colab and reconnect
Error with T4.
UnboundLocalError: local variable 'save_dir' referenced before assignment
Progress:|████ | 15% 300/2000 [05:01<28:29, 1.01s/it, loss=0.0318, lr=1.72e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
that's not related to the GPU, that's a different error EDIT : fixed, a mistake I made with earlier update
A100, another import error but similar. ImportError: cannot import name 'SpatialTransformer' from 'diffusers.models.attention'
main colab, refreshed, loaded up and TRAINING! Cheers, works for me now
@rasamaya are you using a100?
Just ran again with an a100 and still seeing the SpatialTransformer
error
Im pretty sure the code doesnt work on a100, I can get it to run on the regular GPU instance though.
I'm on regular gpu mode, don't do professional this round. not sure which gpu they gave me and I don't think I can see as it's training
Any updates?
Training the unet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 782, in
@TheLastBen I think this might be related to this change in the diffusers library where SpatialTransformer
was removed (check out src/diffusers/models/unet_2d_blocks.py
). I tried modifying the notebook to point to the main fork of the diffusers library, which solved that error but created some others
I fixed the "SpatialTransformers" error by updating the wget for A100 from "https://raw.githubusercontent.com/huggingface/diffusers/main/src/diffusers/models/attention.py"
to "https://raw.githubusercontent.com/huggingface/diffusers/269109dbfbbdbe2800535239b881e96e1828a0ef/src/diffusers/models/attention.py"
. I updated this in all the cells (ie both the initial setup cell and the training cell)
I think https://github.com/huggingface/diffusers/commit/ef2ea33c3bc061fffa8bc4ccd640306ca1a1847d this change renamed SpatialTransformer, so I just pointed it at an older commit before that
Can confirm @nathanielherman 's fix works 👍
A100 -- 33f64fc Error: No module named 'xformers'
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
try this :
%pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl
I'm still unsure if xformers works with A100 or not, try the above, if it works, I'll add back xformers for A100
I'm trying in Colab Pro and getting xformers error too
Training the unet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
create a new cell and try this try this :
%pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl
done, but at beginning of training gives: RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR
2022-11-04 08:21:51.006977: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
Progress:| | 0% 6/2200 [00:14<52:57, 1.45s/it, loss=0.371, lr=1.99e-6] lcsarcimboldo Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 782, in
create a new cell and try this try this :
%pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl
Thanks, this works. But seems to work much slower now ( ~3 times slower).
@corbettaluigi You're probably using an old notebook, use the latest one
create a new cell and try this try this :
%pip install -q https://github.com/TheLastBen/fast-stable-diffusion/raw/main/precompiled/A100/xformers-0.0.13.dev0-py3-none-any.whl
Thanks, this works. But seems to work much slower now ( ~3 times slower).
So you're saying that xformers worked before ? what date exactly so I can bring back the changes
Training the unet... Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 18, in
from diffusers import AutoencoderKL, DDPMScheduler, StableDiffusionPipeline, UNet2DConditionModel
File "/usr/local/lib/python3.7/dist-packages/diffusers/init.py", line 21, in
from .models import AutoencoderKL, UNet2DConditionModel, UNet2DModel, VQModel
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/init.py", line 19, in
from .unet_2d import UNet2DModel
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d.py", line 11, in
from .unet_blocks import UNetMidBlock2D, get_down_block, get_up_block
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 20, in
from .attention import AttentionBlock, SpatialTransformer
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 24, in
from ..models.embeddings import ImagePositionalEmbeddings
ImportError: cannot import name 'ImagePositionalEmbeddings' from 'diffusers.models.embeddings' (/usr/local/lib/python3.7/dist-packages/diffusers/models/embeddings.py)
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/AnnabBaseline3A100', '--save_starting_step=500', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/AnnabBaseline3A100/instance_images', '--output_dir=/content/models/AnnabBaseline3A100', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=no', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1.
Something went wrong