Model fails to save, constantly looking for files in the safety_checker folder

SeverianVoid commented 2 years ago

Describe the bug

After pulling to update any time it tries to save the model it throws errors that it cannot find files and its looking for them in the "snapshots/##########/safety_checker" folder for them. First it was looking for scheduler_config.json so I moved that into the safety_checker folder from the scheduler folder but then the next time I ran it, it errored again this time looking for diffusion_model_pytorch.bin

The exact error message is "OSError: Error no file named diffusion_pytorch_model.bin found in directory /home/dreambooth/.cache/huggingface/diffusers/models--runwayml--stable-diffusion-v1-5/snapshots/3beed0bcb34a3d281ce27bd8a6a1efbb68eada38/safety_checker"

Reproduction

No response

Logs

No response

System Info

diffusers version: 0.4.0.dev0
Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Python version: 3.9.13
PyTorch version (GPU?): 1.12.1+cu116 (True)
Huggingface_hub version: 0.10.0
Transformers version: 4.22.2
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

SeverianVoid commented 2 years ago

Stripped the whole repository and re downloaded everything and now its not doing it anymore so dunno

SeverianVoid commented 2 years ago

And after training and successfully saving a few times, running the exact same launch.sh file its doing it again now.

SeverianVoid commented 2 years ago

(diffusers) dreambooth@User:~/github/diffusers/examples/dreambooth$ ./ss_portrait_launch.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Caching latents: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 556/556 [00:27<00:00, 20.34it/s]
Fetching 15 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 65331.84it/s]
You have passed a non-standard module None. We cannot verify whether it has the correct type                                                                                                                                                               | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/dreambooth/github/diffusers/examples/dreambooth/train_dreambooth.py", line 805, in <module>
    main(args)
  File "/home/dreambooth/github/diffusers/examples/dreambooth/train_dreambooth.py", line 788, in main
    save_weights(global_step)
  File "/home/dreambooth/github/diffusers/examples/dreambooth/train_dreambooth.py", line 669, in save_weights
    pipeline = StableDiffusionPipeline.from_pretrained(
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/pipeline_utils.py", line 404, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/modeling_utils.py", line 317, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named diffusion_pytorch_model.bin found in directory /home/dreambooth/.cache/huggingface/diffusers/models--runwayml--stable-diffusion-v1-5/snapshots/3beed0bcb34a3d281ce27bd8a6a1efbb68eada38/safety_checker.
Steps:  25%|███████████████████████████████████████████████████▌                                                                                                                                                          | 500/2000 [03:23<10:10,  2.46it/s, loss=0.119, lr=1e-6]
Traceback (most recent call last):
  File "/home/dreambooth/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/dreambooth/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--output_dir=portrait/output', '--seed=3434554', '--resolution=512', '--train_batch_size=1', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--sample_batch_size=4', '--max_train_steps=2000', '--save_interval=500', '--save_sample_prompt=starsectorportrait of a person', '--concepts_list=ssportrait_concepts_list.json']' returned non-zero exit status 1.
(diffusers) dreambooth@User:~/github/diffusers/examples/dreambooth$ ./ss_portrait_launch.sh
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Caching latents: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 258/258 [00:15<00:00, 16.41it/s]
Fetching 15 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 56324.58it/s]
You have passed a non-standard module None. We cannot verify whether it has the correct type                                                                                                                                                               | 0/15 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/dreambooth/github/diffusers/examples/dreambooth/train_dreambooth.py", line 805, in <module>
    main(args)
  File "/home/dreambooth/github/diffusers/examples/dreambooth/train_dreambooth.py", line 788, in main
    save_weights(global_step)
  File "/home/dreambooth/github/diffusers/examples/dreambooth/train_dreambooth.py", line 669, in save_weights
    pipeline = StableDiffusionPipeline.from_pretrained(
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/pipeline_utils.py", line 404, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/configuration_utils.py", line 161, in from_config
    config_dict = cls.get_config_dict(pretrained_model_name_or_path=pretrained_model_name_or_path, **kwargs)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/configuration_utils.py", line 217, in get_config_dict
    raise EnvironmentError(
OSError: Error no file named scheduler_config.json found in directory /home/dreambooth/.cache/huggingface/diffusers/models--runwayml--stable-diffusion-v1-5/snapshots/3beed0bcb34a3d281ce27bd8a6a1efbb68eada38/safety_checker.
Steps:  25%|███████████████████████████████████████████████████▌                                                                                                                                                          | 500/2000 [03:21<10:04,  2.48it/s, loss=0.184, lr=1e-6]
Traceback (most recent call last):
  File "/home/dreambooth/anaconda3/envs/diffusers/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/dreambooth/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
(diffusers) dreambooth@User:~/github/diffusers/examples/dreambooth$

Two attempts to run the same launch.sh and both error out while trying to save with similar errors but with different missing files

ShivamShrirao / diffusers