bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.19k stars 621 forks source link

Similar issue trying to create textual inversion with 3070 with both fp16 and bf16 #798

Closed gromegrom closed 9 months ago

gromegrom commented 1 year ago

prepare tokenizers prepare accelerator loading model for process 0/1 load StableDiffusion checkpoint: C:/Users/grome/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0_0.9vae.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: token length for init words is not same to num_vectors_per_token, init words is repeated or truncated / 初期化単語のトークン長がnum_vectors_per_tokenと合わないため、繰り返しまたは切り捨てが発生します: tokenizer 1, length 3 token length for init words is not same to num_vectors_per_token, init words is repeated or truncated / 初期化単語のトークン長がnum_vectors_per_tokenと合わないため、繰り返しまたは切り捨てが発生します: tokenizer 2, length 3 tokens are added for tokenizer 1: [49408] tokens are added for tokenizer 2: [49408] create embeddings for 1 tokens, for muscleman Use DreamBooth method. prepare images. found directory C:\kohya_ss\test000001\img\20_chris hemsworth man contains 3 image files No caption file found for 3 images. Training will continue without captions for these images. If class token exists, it will be used. / 3枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続 行します。class tokenが存在する場合はそれを使います。 C:\kohya_ss\test000001\img\20_chris hemsworth man\2023-09-28 15_23_09.649+0100.jpg C:\kohya_ss\test000001\img\20_chris hemsworth man\20230802_162140.jpg C:\kohya_ss\test000001\img\20_chris hemsworth man\20230819_180411.jpg found directory C:\kohya_ss\test000001\reg\1_man contains 6 image files No caption file found for 6 images. Training will continue without captions for these images. If class token exists, it will be used. / 6枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続 行します。class tokenが存在する場合はそれを使います。 C:\kohya_ss\test000001\reg\1_man\0000.png C:\kohya_ss\test000001\reg\1_man\0012.png C:\kohya_ss\test000001\reg\1_man\0026.png C:\kohya_ss\test000001\reg\1_man\0043.png C:\kohya_ss\test000001\reg\1_man\0059.png C:\kohya_ss\test000001\reg\1_man\0069.png... and 1 more 60 train images with repeating. 6 reg images. [Dataset 0] batch_size: 1 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "C:\kohya_ss\test000001\img\20_chris hemsworth man" image_count: 3 num_repeats: 20 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: chris hemsworth man caption_extension: .caption

[Subset 1 of Dataset 0] image_dir: "C:\kohya_ss\test000001\reg\1_man" image_count: 6 num_repeats: 1 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: True class_tokens: man caption_extension: .caption

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 1800.04it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (704, 1280), count: 20 bucket 1: resolution (768, 1152), count: 20 bucket 2: resolution (1024, 1024), count: 60 bucket 3: resolution (1088, 832), count: 20 mean ar error (without repeats): 0.008814220608011106 Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' [Dataset 0] caching latents. checking cache validity... 100%|██████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 1285.72it/s] caching latents... 0it [00:00, ?it/s] prepare optimizer, data loader etc. use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, 'warmup_init': False} because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 / max_grad_normが設定されているためclip_grad_normが有効になります。0に設定して無効にしたほうがいいかもしれません constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\kohya_ss\sdxl_train_textual_inversion.py:133 in │ │ │ │ 130 │ args = train_util.read_config_from_file(args, parser) │ │ 131 │ │ │ 132 │ trainer = SdxlTextualInversionTrainer() │ │ ❱ 133 │ trainer.train(args) │ │ 134 │ │ │ │ C:\kohya_ss\train_textual_inversion.py:443 in train │ │ │ │ 440 │ │ │ # text_encoder.text_model.embeddings.token_embedding.requiresgrad(True) │ │ 441 │ │ │ │ 442 │ │ unet.requiresgrad(False) │ │ ❱ 443 │ │ unet.to(accelerator.device, dtype=weight_dtype) │ │ 444 │ │ if args.gradient_checkpointing: # according to TI example in Diffusers, train i │ │ 445 │ │ │ # TODO U-Netをオリジナルに置き換えたのでいらないはずなので、後で確認して消す │ │ 446 │ │ │ unet.train() │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:1145 in to │ │ │ │ 1142 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ 1143 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 1144 │ │ │ │ ❱ 1145 │ │ return self._apply(convert) │ │ 1146 │ │ │ 1147 │ def register_full_backward_pre_hook( │ │ 1148 │ │ self, │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:797 in _apply │ │ │ │ 794 │ │ │ 795 │ def _apply(self, fn): │ │ 796 │ │ for module in self.children(): │ │ ❱ 797 │ │ │ module._apply(fn) │ │ 798 │ │ │ │ 799 │ │ def compute_should_use_set_data(tensor, tensor_applied): │ │ 800 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:820 in _apply │ │ │ │ 817 │ │ │ # track autograd history of param_applied, so we have to use │ │ 818 │ │ │ # with torch.no_grad(): │ │ 819 │ │ │ with torch.no_grad(): │ │ ❱ 820 │ │ │ │ param_applied = fn(param) │ │ 821 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │ │ 822 │ │ │ if should_use_set_data: │ │ 823 │ │ │ │ param.data = param_applied │ │ │ │ C:\Python310\lib\site-packages\torch\nn\modules\module.py:1143 in convert │ │ │ │ 1140 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │ │ 1141 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │ │ 1142 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │ │ ❱ 1143 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │ │ 1144 │ │ │ │ 1145 │ │ return self._apply(convert) │ │ 1146 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 8.00 GiB total capacity; 7.04 GiB already allocated; 0 bytes free; 7.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ C:\Python310\lib\runpy.py:196 in _run_module_as_main │ │ │ │ 193 │ main_globals = sys.modules["main"].dict │ │ 194 │ if alter_argv: │ │ 195 │ │ sys.argv[0] = mod_spec.origin │ │ ❱ 196 │ return _run_code(code, main_globals, None, │ │ 197 │ │ │ │ │ "main", mod_spec) │ │ 198 │ │ 199 def run_module(mod_name, init_globals=None, │ │ │ │ C:\Python310\lib\runpy.py:86 in _run_code │ │ │ │ 83 │ │ │ │ │ loader = loader, │ │ 84 │ │ │ │ │ package = pkg_name, │ │ 85 │ │ │ │ │ spec = mod_spec) │ │ ❱ 86 │ exec(code, run_globals) │ │ 87 │ return run_globals │ │ 88 │ │ 89 def _run_module_code(code, init_globals=None, │ │ │ │ in :7 │ │ │ │ 4 from accelerate.commands.accelerate_cli import main │ │ 5 if name == 'main': │ │ 6 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 7 │ sys.exit(main()) │ │ 8 │ │ │ │ C:\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ C:\Python310\lib\site-packages\accelerate\commands\launch.py:918 in launch_command │ │ │ │ 915 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │ │ 916 │ │ sagemaker_launcher(defaults, args) │ │ 917 │ else: │ │ ❱ 918 │ │ simple_launcher(args) │ │ 919 │ │ 920 │ │ 921 def main(): │ │ │ │ C:\Python310\lib\site-packages\accelerate\commands\launch.py:580 in simple_launcher │ │ │ │ 577 │ process.wait() │ │ 578 │ if process.returncode != 0: │ │ 579 │ │ if not args.quiet: │ │ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │ │ 581 │ │ else: │ │ 582 │ │ │ sys.exit(1) │ │ 583 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['C:\Python310\python.exe', './sdxl_train_textual_inversion.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=C:/Users/grome/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0_0.9vae.saf etensors', '--train_data_dir=C:/kohya_ss/test000001/img', '--reg_data_dir=C:/kohya_ss/test000001/reg', '--resolution=1024,1024', '--output_dir=C:/kohya_ss/test000001', '--save_model_as=safetensors', '--output_name=furidrums', '--lr_scheduler_num_cycles=1', '--max_data_loader_n_workers=0', '--no_half_vae', '--learning_rate=0.0003', '--lr_scheduler=constant', '--train_batch_size=1', '--max_train_steps=120', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=Adafactor', '--optimizer_args', 'scale_parameter=False', 'relative_step=False', 'warmup_init=False', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0', '--token_string=muscleman', '--init_word=*furidrums', '--num_vectors_per_token=1']' returned non-zero exit status 1.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.