bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.18k stars 621 forks source link

One week trying to make kohya work #766

Closed nawnawaynbee closed 9 months ago

nawnawaynbee commented 1 year ago

I would first like to thank bmaltais for the work he has done, but I can't seem to use kohya. I've been trying to get kohya to work for a week now without any success, i've already tried to replace .py in the 3 library. without success, i've already tried to put "Adamw" without success, i got a gpu rtx3090 and windows. I've already tried to run versions 21.7.10 and 7.5 without success. I've gone back to the latest version and I keep getting this error message. I hope after searching a lot of forum (korean sometimes) a savior can help me.

I always get this message: create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc.

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA SETUP: Loading binary C:\AI\kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 2800 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 1400 num epochs / epoch数: 1 batch size per device / バッチサイズ: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1400 steps: 0%| | 0/1400 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\AI\kohya\kohya_ss\train_network.py:990 in │ │ │ │ 987 │ args = train_util.read_config_from_file(args, parser) │ │ 988 │ │ │ 989 │ trainer = NetworkTrainer() │ │ ❱ 990 │ trainer.train(args) │ │ 991 │ │ │ │ C:\AI\kohya\kohya_ss\train_network.py:683 in train │ │ │ │ 680 │ │ │ init_kwargs = {} │ │ 681 │ │ │ if args.log_tracker_config is not None: │ │ 682 │ │ │ │ init_kwargs = toml.load(args.log_tracker_config) │ │ ❱ 683 │ │ │ accelerator.init_trackers( │ │ 684 │ │ │ │ "network_train" if args.log_tracker_name is None else args.log_tracker_n │ │ 685 │ │ │ ) │ │ 686 │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:564 in _inner │ │ │ │ 561 │ │ │ │ ) │ │ 562 │ │ │ │ 563 │ │ def _inner(args, kwargs): │ │ ❱ 564 │ │ │ return PartialState().on_main_process(function)(*args, kwargs) │ │ 565 │ │ │ │ 566 │ │ return _inner │ │ 567 │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py:2093 in init_trackers │ │ │ │ 2090 │ │ │ │ if getattr(tracker_init, "requires_logging_directory"): │ │ 2091 │ │ │ │ │ # We can skip this check since it was done in __init__ │ │ 2092 │ │ │ │ │ self.trackers.append( │ │ ❱ 2093 │ │ │ │ │ │ tracker_init(project_name, self.logging_dir, init_kwargs.get(s │ │ 2094 │ │ │ │ │ ) │ │ 2095 │ │ │ │ else: │ │ 2096 │ │ │ │ │ self.trackers.append(tracker_init(project_name, *init_kwargs.get(st │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\tracking.py:83 in execute_on_main_process │ │ │ │ 80 │ @wraps(function) │ │ 81 │ def execute_on_main_process(self, args, kwargs): │ │ 82 │ │ if getattr(self, "main_process_only", False): │ │ ❱ 83 │ │ │ return PartialState().on_main_process(function)(self, *args, *kwargs) │ │ 84 │ │ else: │ │ 85 │ │ │ return function(self, args, kwargs) │ │ 86 │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\tracking.py:190 in init │ │ │ │ 187 │ │ super().init() │ │ 188 │ │ self.run_name = run_name │ │ 189 │ │ self.logging_dir = os.path.join(logging_dir, run_name) │ │ ❱ 190 │ │ self.writer = tensorboard.SummaryWriter(self.logging_dir, kwargs) │ │ 191 │ │ logger.debug(f"Initialized TensorBoard project {self.run_name} logging to {self. │ │ 192 │ │ logger.debug( │ │ 193 │ │ │ "Make sure to log any initial configurations with `self.store_init_configura │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\writer.py:247 in init │ │ │ │ 244 │ │ # Initialize the file writers, but they can be cleared out on close │ │ 245 │ │ # and recreated later as needed. │ │ 246 │ │ self.file_writer = self.all_writers = None │ │ ❱ 247 │ │ self._get_file_writer() │ │ 248 │ │ │ │ 249 │ │ # Create default bins for histograms, see generate_testdata.py in tensorflow/ten │ │ 250 │ │ v = 1e-12 │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\writer.py:277 in │ │ _get_file_writer │ │ │ │ 274 │ def _get_file_writer(self): │ │ 275 │ │ """Returns the default FileWriter instance. Recreates it if closed.""" │ │ 276 │ │ if self.all_writers is None or self.file_writer is None: │ │ ❱ 277 │ │ │ self.file_writer = FileWriter( │ │ 278 │ │ │ │ self.log_dir, self.max_queue, self.flush_secs, self.filename_suffix │ │ 279 │ │ │ ) │ │ 280 │ │ │ self.all_writers = {self.file_writer.get_logdir(): self.file_writer} │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\torch\utils\tensorboard\writer.py:76 in init │ │ │ │ 73 │ │ # TODO: See if we can remove this in the future if we are │ │ 74 │ │ # actually the ones passing in a PosixPath │ │ 75 │ │ log_dir = str(log_dir) │ │ ❱ 76 │ │ self.event_writer = EventFileWriter( │ │ 77 │ │ │ log_dir, max_queue, flush_secs, filename_suffix │ │ 78 │ │ ) │ │ 79 │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\tensorboard\summary\writer\event_file_writer.py:72 │ │ in init │ │ │ │ 69 │ │ │ pending events and summaries to disk. │ │ 70 │ │ """ │ │ 71 │ │ self._logdir = logdir │ │ ❱ 72 │ │ tf.io.gfile.makedirs(logdir) │ │ 73 │ │ self._file_name = ( │ │ 74 │ │ │ os.path.join( │ │ 75 │ │ │ │ logdir, │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\tensorflow\python\lib\io\file_io.py:513 in │ │ recursive_create_dir_v2 │ │ │ │ 510 Raises: │ │ 511 │ errors.OpError: If the operation fails. │ │ 512 """ │ │ ❱ 513 _pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path)) │ │ 514 │ │ 515 │ │ 516 @tf_export("io.gfile.copy") │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ FailedPreconditionError: C:/Users/Aynbee/Documents/Création Lora/Clayn/image/log is not a directory steps: 0%| | 0/1400 [00:00<?, ?it/s] ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\Users\Aynbee\AppData\Local\Programs\Python\Python310\lib\runpy.py:196 in _run_module_as_main │ │ │ │ 193 │ main_globals = sys.modules["main"].dict │ │ 194 │ if alter_argv: │ │ 195 │ │ sys.argv[0] = mod_spec.origin │ │ ❱ 196 │ return _run_code(code, main_globals, None, │ │ 197 │ │ │ │ │ "main", mod_spec) │ │ 198 │ │ 199 def run_module(mod_name, init_globals=None, │ │ │ │ C:\Users\Aynbee\AppData\Local\Programs\Python\Python310\lib\runpy.py:86 in _run_code │ │ │ │ 83 │ │ │ │ │ loader = loader, │ │ 84 │ │ │ │ │ package = pkg_name, │ │ 85 │ │ │ │ │ spec = mod_spec) │ │ ❱ 86 │ exec(code, run_globals) │ │ 87 │ return run_globals │ │ 88 │ │ 89 def _run_module_code(code, init_globals=None, │ │ │ │ in :7 │ │ │ │ 4 from accelerate.commands.accelerate_cli import main │ │ 5 if name == 'main': │ │ 6 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 7 │ sys.exit(main()) │ │ 8 │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py:918 in launch_command │ │ │ │ 915 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 916 │ │ sagemaker_launcher(defaults, args) │ │ 917 │ else: │ │ ❱ 918 │ │ simple_launcher(args) │ │ 919 │ │ 920 │ │ 921 def main(): │ │ │ │ C:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py:580 in simple_launcher │ │ │ │ 577 │ process.wait() │ │ 578 │ if process.returncode != 0: │ │ 579 │ │ if not args.quiet: │ │ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 581 │ │ else: │ │ 582 │ │ │ sys.exit(1) │ │ 583 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['C:\AI\kohya\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--pretrained_model_name_or_path=C:/AI/stable-diffusion-webui/models/Stable-diffusion/realisticVisionV51_v51VAE.safetensors', '--train_data_dir=C:/Users/Aynbee/Documents/Création Lora/Clayn/image/img', '--resolution=512,512', '--output_dir=C:/Users/Aynbee/Documents/Création Lora/Clayn/image/model', '--logging_dir=C:/Users/Aynbee/Documents/Création Lora/Clayn/image/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=128', '--output_name=clayn', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=1400', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

nawnawaynbee commented 1 year ago

`EDIT: if you have the same mistake as me. Just put the images on top of the local disk (C for me). shame on me.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.