TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.49k stars 1.3k forks source link

After training dreambooth once and testing in webui i cant train again when i stop webui #79

Open 1blackbar opened 1 year ago

1blackbar commented 1 year ago

What i must do to train again immediately ? It wont run and errors out. You can easily try it by training for 100 steps and then trying to train again with same settings

by the way how i can control training repeat rate with your colab ?

TheLastBen commented 1 year ago

Specify shat type of error, do a screenshot or paste the log

1blackbar commented 1 year ago

Here it is , closed webui, wanted to train on new images

The following values were not passed toaccelerate launchand had defaults used instead: --num_processeswas set to a value of1 --num_machineswas set to a value of1 --mixed_precisionwas set to a value of'no' --num_cpu_threads_per_processwas set to1to improve out-of-box performance To avoid this warning pass in values for each of the problematic parameters or runaccelerate config`. Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py", line 609, in _get_config_dict user_agent=user_agent, File "/usr/local/lib/python3.7/dist-packages/transformers/utils/hub.py", line 297, in cached_path raise EnvironmentError(f"file {url_or_filename} not found") OSError: file /content/stable-diffusion-v1-4/config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 398, in main args.pretrained_model_name_or_path, subfolder="text_encoder", use_auth_token=args.use_auth_token File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py", line 1776, in from_pretrained kwargs, File "/usr/local/lib/python3.7/dist-packages/transformers/models/clip/configuration_clip.py", line 126, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py", line 553, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py", line 642, in _get_config_dict f"Can't load config for '{pretrained_model_name_or_path}'. If you were trying to load it from " OSError: Can't load config for '/content/stable-diffusion-v1-4'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/content/stable-diffusion-v1-4' is the correct path to a directory containing a config.json file Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-4', '--instance_data_dir=/content/data/tmnt', '--output_dir=/content/models/tmnt', '--instance_prompt=photo of tmnt', '--seed=11111', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=1800']' returned non-zero exit status 1. Something went wrong`

TheLastBen commented 1 year ago

it looks like you have disconnected from the runtime and the original model got deleted, I will add the option to keep the original model in gdrive (5GB) to avoid redownloading.

1blackbar commented 1 year ago

No cause i rerun the cell to download from huggingface and it still happens but i will try the fix, also theres no way i disconnected, i was prompting the whole time

What i did ito try to fix this is to change the names of the folders to gibbersish so it wont use old folders but new ones , tried to rerun all cells with dependencies, it went fine but still errors our on training, the only wayu to train agin is to totally disconnect and restart from 0

oh crap... so i disconnected to rerun and was welcomed with NO GPU for me despite on colab pro... great This is exactly why i want to retrain again during same session

TheLastBen commented 1 year ago

I'll check that out shortly

TheLastBen commented 1 year ago

fixed, update the colab and confirm

1blackbar commented 1 year ago

works! Thanks for fix