d8ahazard / sd_dreambooth_extension

Other
1.85k stars 280 forks source link

Impossible to create a new model, if I choose .safetensors checkpoint as a source. Converted safetensor model as a fp32 .ckpt also isn't liked. #781

Closed mart-hill closed 1 year ago

mart-hill commented 1 year ago

Kindly read the entire form below and fill it out with the requested information.

Please find the following lines in the console and paste them below. If you do not provide this information, your issue will be automatically closed.

` Python revision: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Dreambooth revision: 17c3864803ebb50615205271de687be96cfc96e8 SD-WebUI revision: ff6a5bcec1ce25aa8f08b157ea957d764be23d8d

Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers version 0.10.2 installed. [+] transformers version 4.25.1 installed. [+] xformers version 0.0.14.dev0 installed. [+] torch version 1.12.1+cu116 installed. [+] torchvision version 0.13.1+cu116 installed. `

Have you read the Readme? Yes Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? Always after updating Have you updated Dreambooth to the latest revision? Yes Have you updated the Stable-Diffusion-WebUI to the latest version? Yes No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. OK Describe the bug

(A clear and concise description of what the bug is)

While trying to create a new dreambooth model, based on .safetensors source model format, I only get this error wall:

Loading checkpoint...
Exception setting up output: invalid load key, '\xce'.
Traceback (most recent call last):
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\sd_to_diff.py", line 800, in extract_checkpoint
    checkpoint = torch.load(ckpt_path)
  File "O:\AI\stable-diffusion-webui\extensions\sd_smartprocess\reallysafe.py", line 117, in load
    return load_with_extra(filename, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\extensions\sd_smartprocess\reallysafe.py", line 164, in load_with_extra
    return unsafe_torch_load(filename, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\reallysafe.py", line 117, in load
    return load_with_extra(filename, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\reallysafe.py", line 164, in load_with_extra
    return unsafe_torch_load(filename, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\modules\safe.py", line 106, in load
    return load_with_extra(filename, extra_handler=global_extra_handler, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\modules\safe.py", line 151, in load_with_extra
    return unsafe_torch_load(filename, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "O:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xce'.
Pipeline or config is not set, unable to continue.
Can't load config, specify a model name!
Total VRAM: 24

...whereas .ckpt model as a source works fine. I didn't see anything in readme about using .safetensors format as a source model, is it supported?

When I converted the source model to .ckpt fp32 and checked it for use, then tried to create the new model, I've got this error wall:

Checkpoint O:\AI\stable-diffusion-webui\models\Stable-diffusion\ratnikamix_v2-fp32.ckpt has both EMA and non-EMA weights.
In this conversion only the non-EMA weights are extracted. If you want to instead extract the EMA weights (usually better for inference), please make sure to add the `--extract_ema` flag.
Exception setting up output: Error(s) in loading state_dict for UNet2DConditionModel:
        Missing key(s) in state_dict: "up_blocks.0.upsamplers.0.conv.weight", "up_blocks.0.upsamplers.0.conv.bias", "up_blocks.1.upsamplers.0.conv.weight", "up_blocks.1.upsamplers.0.conv.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias".
        Unexpected key(s) in state_dict: "up_blocks.0.attentions.2.conv.bias", "up_blocks.0.attentions.2.conv.weight".
Traceback (most recent call last):
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\sd_to_diff.py", line 922, in extract_checkpoint
    unet.load_state_dict(converted_unet_checkpoint)
  File "O:\AI\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
        Missing key(s) in state_dict: "up_blocks.0.upsamplers.0.conv.weight", "up_blocks.0.upsamplers.0.conv.bias", "up_blocks.1.upsamplers.0.conv.weight", "up_blocks.1.upsamplers.0.conv.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias".
        Unexpected key(s) in state_dict: "up_blocks.0.attentions.2.conv.bias", "up_blocks.0.attentions.2.conv.weight".
Pipeline or config is not set, unable to continue.
Can't load config!
Traceback (most recent call last):
  File "O:\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 321, in run_predict
    output = await app.blocks.process_api(
  File "O:\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1016, in process_api
    data = self.postprocess_data(fn_index, result["prediction"], state)
  File "O:\AI\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 945, in postprocess_data
    if predictions[i] is components._Keywords.FINISHED_ITERATING:
IndexError: tuple index out of range

It's weird - I can use midjourney .ckpt model as a source, but this one - nope.

Spoke too soon though - after finally setting things up and trying to train, I've got this:

Traceback (most recent call last):
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 571, in start_training
    latest_file = max(list_of_files, key=os.path.getmtime)
ValueError: max() arg is an empty sequence
Training completed, reloading SD Model.
Restored system models.
Returning result: Exception training model: 'max() arg is an empty sequence'.

At this point, I'm a bit lost. I've been able to use this extension with its earlier versions though (both WebUI and Dreambooth).

After finally managing to start the training, pausing the GPU got me this error:

Steps:   0%|                                                                           | 0/17100 [00:00<?, ?it/s, loss=0.0667, loss_avg=0.353, lr=4.99e-6, vram_usage=15.1]Giving the GPU a break for 30.0 seconds.
Traceback (most recent call last):
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 561, in start_training
    result = main(config, use_txt2img=use_txt2img)
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 973, in main
    return inner_loop()
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 116, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "O:\AI\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 957, in inner_loop
    for i in range(args.epoch_pause_time):
TypeError: 'float' object cannot be interpreted as an integer
Steps:  25%|██████████████▊                                            | 4275/17100 [00:00<00:00, 2873041.12it/s, loss=0.0667, loss_avg=0.353, lr=4.99e-6, vram_usage=15.1]
Training completed, reloading SD Model.
Restored system models.
Returning result: Exception training model: ''float' object cannot be interpreted as an integer'.

Are those errors all connected to the use of txt2img option (which, I read, should be unchecked, for now)?

Provide logs

If a crash has occurred, please provide the entire stack trace from the log, including the last few log messages before the crash occurred.

Environment

What OS? Windows 10 If Windows - WSL or native? Native What GPU are you using? RTX 3090 Screenshots/Config If the issue is specific to an error while training, please provide a screenshot of training parameters or the db_config.json file from /models/dreambooth/MODELNAME/db_config.json

jsbach-jung commented 1 year ago

Try my workaround and see if it fixes it for you. https://github.com/d8ahazard/sd_dreambooth_extension/discussions/794

mart-hill commented 1 year ago

Try my workaround and see if it fixes it for you. #794

Thank you, I'll try that! (my poor SSD, sniff) 🙂

jsbach-jung commented 1 year ago

I'm actually surprised you got that far! I've never been able to train a single step when encountering this error.

Flonixcorn commented 1 year ago

794 worked

mart-hill commented 1 year ago

Try my workaround and see if it fixes it for you. #794

It did, indeed, but... My poor SSD massacred with writes, sniff... 😅