TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.48k stars 1.3k forks source link

Fast Dreambooth errors #2368

Open Z0rx opened 1 year ago

Z0rx commented 1 year ago

Since 2 day i got errors if i want to train the UNET

Training the UNet... '########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######::: ... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##:: ::: ##:::: ##:::: ##::'##:. ##::: ##:: ####: ##:: ##:: ####: ##: ##:::..::: ::: ##:::: ########::'##:::. ##:: ##:: ## ## ##:: ##:: ## ## ##: ##::'####: ::: ##:::: ##.. ##::: #########:: ##:: ##. ####:: ##:: ##. ####: ##::: ##:: ::: ##:::: ##::. ##:: ##.... ##:: ##:: ##:. ###:: ##:: ##:. ###: ##::: ##:: ::: ##:::: ##:::. ##: ##:::: ##:'####: ##::. ##:'####: ##::. ##:. ######::: :::..:::::..:::::..::..:::::..::....::..::::..::....::..::::..:::......::::

0% 0/3500 [00:00<?, ?it/s] asosa asosa Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 723, in main accelerator.clip_gradnorm(params_to_clip, args.max_grad_norm) File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 920, in clip_gradnorm self.unscale_gradients() File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 904, in unscalegradients self.scaler.unscale(opt) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/gradscaler.py", line 284, in unscale optimizer_state["found_inf_per_device"] = self._unscalegrads(optimizer, inv_scale, found_inf, False) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscalegrads raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. 0% 0/3500 [00:02<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=500', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/asosa', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/asosa/instance_images', '--output_dir=/content/models/asosa', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/asosa/captions', '--instance_prompt=', '--seed=196321', '--resolution=768', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=3500']' returned non-zero exit status 1.

TheLastBen commented 1 year ago

use the ckpt/safetensors version of the model

Z0rx commented 1 year ago

same error, ill checked it with "safetensors" in "model download cell"

TheLastBen commented 1 year ago

remove the hf link from the cell when using a ckpt or safetensors

Z0rx commented 1 year ago

if i want to download a finetune model from HF

it shows me this warning " V1 /content/stable-diffusion-custom hint: Using 'master' as the name for the initial branch. This default branch name hint: is subject to change. To configure the initial branch name to use in all hint: of your new repositories, which will suppress this warning, call: hint: hint: git config --global init.defaultBranch hint: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and hint: 'development'. The just-created branch can be renamed via this command: hint: hint: git branch -m Initialized empty Git repository in /content/stable-diffusion-custom/.git/ Git LFS initialized. Updating origin remote: Enumerating objects: 79, done."

i work with your code since 4 months and this warning never showed up since 2 days ago.

I downloaded the HF model and removed the Link, still dont work

TheLastBen commented 1 year ago

I need a link to the model to test it

Z0rx commented 1 year ago

https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0/blob/main/dreamlike-photoreal-2.0.safetensors

TheLastBen commented 1 year ago

https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0/blob/main/dreamlike-photoreal-2.0.safetensors

you're using the wrong link format

this is the correct link https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0/resolve/main/dreamlike-photoreal-2.0.safetensors

use "resolve" instead of "blob"

tested it, it works fine

Z0rx commented 1 year ago

I cant drop the complete link into the text bar

"Load and finetune a model from Hugging Face, use the format "profile/model" like : runwayml/stable-diffusion-v1-5"

code snip

"if Path_to_HuggingFace != "": if authe=="https://": textenc= f"{authe}huggingface.co/{Path_to_HuggingFace}/resolve/main/text_encoder/pytorch_model.bin" txtenc_size=urllib.request.urlopen(textenc).info().get('Content-Length', None) else: textenc= f"https://huggingface.co/{Path_to_HuggingFace}/resolve/main/text_encoder/pytorch_model.bin""

The cell is using /resolve/

i tested with 3 different finetune models directly from github and copying it to my gdrive all giving me this error

Training the UNet... Traceback (most recent call last): File "/usr/local/bin/accelerate", line 5, in from accelerate.commands.accelerate_cli import main File "/usr/local/lib/python3.10/dist-packages/accelerate/init.py", line 7, in from .accelerator import Accelerator File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 25, in import torch File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1476, in from torch import func as func File "/usr/local/lib/python3.10/dist-packages/torch/func/init.py", line 1, in from torch._functorch.eager_transforms import ( File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/eager_transforms.py", line 12, in from torch.fx.experimental import const_fold File "/usr/local/lib/python3.10/dist-packages/torch/fx/experimental/const_fold.py", line 6, in from torch.fx.passes.split_module import split_module File "/usr/local/lib/python3.10/dist-packages/torch/fx/passes/init.py", line 1, in from . import graph_drawer File "/usr/local/lib/python3.10/dist-packages/torch/fx/passes/graph_drawer.py", line 13, in import pydot File "/usr/local/lib/python3.10/dist-packages/pydot.py", line 15, in import dot_parser File "/usr/local/lib/python3.10/dist-packages/dot_parser.py", line 14, in from pyparsing import ( File "/usr/local/lib/python3.10/dist-packages/pyparsing/init.py", line 136, in from .helpers import # type: ignore[misc, assignment] File "/usr/local/lib/python3.10/dist-packages/pyparsing/helpers.py", line 670, in typing.Optional[ParseAction], File "/usr/lib/python3.10/typing.py", line 309, in inner return cached(args, **kwds) File "/usr/lib/python3.10/typing.py", line 1248, in hash return hash(frozenset(self.args)) File "/usr/lib/python3.10/typing.py", line 1038, in hash return hash((self.origin, self.args)) KeyboardInterrupt ^C Something went wrong"

Z0rx commented 1 year ago

Training the UNet... Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1372, in _path_importer_cache KeyError: '/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1078, in _handle_fromlist File "", line 241, in _call_with_frames_removed File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 37, in from scipy.sparse import issparse # pylint: disable=g-import-not-at-top File "/usr/local/lib/python3.10/dist-packages/scipy/sparse/init.py", line 283, in from . import csgraph File "/usr/local/lib/python3.10/dist-packages/scipy/sparse/csgraph/init.py", line 185, in from ._laplacian import laplacian File "/usr/local/lib/python3.10/dist-packages/scipy/sparse/csgraph/_laplacian.py", line 7, in from scipy.sparse.linalg import LinearOperator File "/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/init.py", line 123, in from ._eigen import File "/usr/local/lib/python3.10/dist-packages/scipy/sparse/linalg/_eigen/init.py", line 9, in from .arpack import File "", line 1027, in _find_and_load File "", line 1002, in _find_and_load_unlocked File "", line 945, in _find_spec File "", line 1439, in find_spec File "", line 1408, in _get_spec File "", line 1374, in _path_importer_cache File "", line 1346, in _path_hooks KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/accelerate", line 5, in from accelerate.commands.accelerate_cli import main File "/usr/local/lib/python3.10/dist-packages/accelerate/init.py", line 7, in from .accelerator import Accelerator File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 33, in from .tracking import LOGGER_TYPE_TO_CLASS, GeneralTracker, filter_trackers File "/usr/local/lib/python3.10/dist-packages/accelerate/tracking.py", line 29, in from torch.utils import tensorboard File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/init.py", line 12, in from .writer import FileWriter, SummaryWriter # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 16, in from ._embedding import ( File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/_embedding.py", line 9, in _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join") File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 65, in getattr return getattr(load_once(self), attr_name) File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 45, in tf import tensorflow File "/usr/local/lib/python3.10/dist-packages/tensorflow/init.py", line 37, in from tensorflow.python.tools import module_util as _module_util File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/init.py", line 45, in from tensorflow.python.feature_column import feature_column_lib as feature_column File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in from tensorflow.python.feature_column.feature_column import * File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/feature_column/feature_column.py", line 143, in from tensorflow.python.layers import base File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/layers/base.py", line 16, in from tensorflow.python.keras.legacy_tf_layers import base File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/keras/init.py", line 25, in from tensorflow.python.keras import models File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/keras/models.py", line 25, in from tensorflow.python.keras.engine import training_v1 File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/keras/engine/training_v1.py", line 46, in from tensorflow.python.keras.engine import training_arrays_v1 File "", line 1078, in _handle_fromlist KeyboardInterrupt ^C Something went wrong

maybe this is usefull?

TheLastBen commented 1 year ago

do a screenshot the the model cell

Z0rx commented 1 year ago

https://codefile.io/f/oGcNhTqRD5

TheLastBen commented 1 year ago

I did instruct you to not use the path to hugginface and use the safetensors link in the link input box https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0/resolve/main/dreamlike-photoreal-2.0.safetensors

Z0rx commented 1 year ago

i could download the model now but training still dont work for me, idk why.

Training the UNet... Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 42, in tf from tensorboard.compat import notf # noqa: F401 ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1027, in _find_and_load File "", line 992, in _find_and_load_unlocked File "", line 241, in _call_with_frames_removed File "/usr/local/lib/python3.10/dist-packages/keras/layers/init.py", line 20, in from keras.engine.base_preprocessing_layer import PreprocessingLayer File "/usr/local/lib/python3.10/dist-packages/keras/engine/base_preprocessing_layer.py", line 21, in from keras.engine import data_adapter File "/usr/local/lib/python3.10/dist-packages/keras/engine/data_adapter.py", line 44, in import pandas as pd File "/usr/local/lib/python3.10/dist-packages/pandas/init.py", line 141, in from pandas.io.api import ( File "/usr/local/lib/python3.10/dist-packages/pandas/io/api.py", line 6, in from pandas.io.excel import ( File "/usr/local/lib/python3.10/dist-packages/pandas/io/excel/init.py", line 6, in from pandas.io.excel._odswriter import ODSWriter as _ODSWriter File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 879, in exec_module File "", line 1012, in get_code File "", line 672, in _compile_bytecode KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/accelerate", line 5, in from accelerate.commands.accelerate_cli import main File "/usr/local/lib/python3.10/dist-packages/accelerate/init.py", line 7, in from .accelerator import Accelerator File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 33, in from .tracking import LOGGER_TYPE_TO_CLASS, GeneralTracker, filter_trackers File "/usr/local/lib/python3.10/dist-packages/accelerate/tracking.py", line 29, in from torch.utils import tensorboard File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/init.py", line 12, in from .writer import FileWriter, SummaryWriter # noqa: F401 File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/writer.py", line 16, in from ._embedding import ( File "/usr/local/lib/python3.10/dist-packages/torch/utils/tensorboard/_embedding.py", line 9, in _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join") File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 65, in getattr return getattr(load_once(self), attr_name) File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 97, in wrapper cache[arg] = f(arg) File "/usr/local/lib/python3.10/dist-packages/tensorboard/lazy.py", line 50, in load_once module = load_fn() File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/init.py", line 45, in tf import tensorflow File "/usr/local/lib/python3.10/dist-packages/tensorflow/init.py", line 476, in _keras._load() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/lazy_loader.py", line 41, in _load module = importlib.import_module(self.name) File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/usr/local/lib/python3.10/dist-packages/keras/init.py", line 21, in from keras import models File "/usr/local/lib/python3.10/dist-packages/keras/models/init.py", line 18, in from keras.engine.functional import Functional File "/usr/local/lib/python3.10/dist-packages/keras/engine/functional.py", line 34, in from keras.engine import training as training_lib File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 32, in from keras.engine import compile_utils File "/usr/local/lib/python3.10/dist-packages/keras/engine/compile_utils.py", line 24, in from keras import metrics as metrics_mod File "/usr/local/lib/python3.10/dist-packages/keras/metrics/init.py", line 84, in from keras.metrics.confusion_metrics import AUC File "/usr/local/lib/python3.10/dist-packages/keras/metrics/confusion_metrics.py", line 22, in from keras import activations File "/usr/local/lib/python3.10/dist-packages/keras/activations.py", line 21, in import keras.layers.activation as activation_layers File "", line 1024, in _find_and_load File "", line 174, in exit File "", line 134, in release KeyboardInterrupt ^C Something went wrong