bmaltais / kohya_ss

Apache License 2.0
9.36k stars 1.21k forks source link

Torch not compiled with CUDA enabled on M1 mac #883

Closed Aniket22156 closed 7 months ago

Aniket22156 commented 1 year ago

Folder 100_heer: 26 images found Folder 100_heer: 2600 steps Total steps: 2600 Train batch size: 1 Gradient accumulation steps: 1.0 Epoch: 1 Regulatization factor: 1 max_train_steps (2600 / 1 / 1.0 1 1) = 2600 stop_text_encoder_training = 0 lr_warmup_steps = 260 accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/Users/aniketsharma/Documents/Sharma/image" --resolution=512,512 --output_dir="/Users/aniketsharma/Documents/Sharma/model" --logging_dir="/Users/aniketsharma/Documents/Sharma/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="260" --train_batch_size="1" --max_train_steps="2600" --save_every_n_epochs="1" --mixed_precision="no" --save_precision="float" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --mem_eff_attn --xformers --bucket_no_upscale prepare tokenizer Use DreamBooth method. prepare images. found directory /Users/aniketsharma/Documents/Sharma/image/100_heer contains 26 image files 2600 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 1024 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "/Users/aniketsharma/Documents/Sharma/image/100_heer" image_count: 26 num_repeats: 100 shuffle_caption: False keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: heer caption_extension: .caption

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 2470.48it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (512, 512), count: 2600 mean ar error (without repeats): 0.0 prepare accelerator /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir instead. warnings.warn( Using accelerator 0.15.0 or above. loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Fetching 15 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 37673.39it/s] /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(filename, framework="pt", device=device) as f: /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = cls(wrap_storage=untyped_storage) /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() with safe_open(checkpoint_file, framework="pt") as f: You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . CrossAttention.forward has been replaced to FlashAttention (not xformers) [Dataset 0] caching latents. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:09<00:00, 2.69it/s] import network module: networks.lora create LoRA network. base dim (rank): 8, alpha: 1.0 create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc.

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " CUDA SETUP: Loading binary /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so... dlopen(/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (no such file), '/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file) use 8-bit AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 2600 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 2600 num epochs / epoch数: 1 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 2600 steps: 0%| | 0/2600 [00:00<?, ?it/s] epoch 1/1 Traceback (most recent call last): File "/Users/aniketsharma/Documents/taining/kohya_ss/train_network.py", line 783, in train(args) File "/Users/aniketsharma/Documents/taining/kohya_ss/train_network.py", line 634, in train optimizer.step() File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/optimizer.py", line 140, in step self.optimizer.step(closure) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper return wrapped(args, kwargs) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, *kwargs) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 263, in step self.update_step(group, p, gindex, pindex) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 504, in update_step F.optimizer_update_8bit_blockwise( File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 972, in optimizer_update_8bit_blockwise prev_device = pre_call(g.device) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 317, in pre_call prev_device = torch.cuda.current_device() File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 674, in current_device _lazy_init() File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled steps: 0%| | 0/2600 [00:05<?, ?it/s] Traceback (most recent call last): File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 923, in launch_command simple_launcher(args) File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/Users/aniketsharma/Documents/taining/kohya_ss/venv/bin/python', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/Users/aniketsharma/Documents/Sharma/image', '--resolution=512,512', '--output_dir=/Users/aniketsharma/Documents/Sharma/model', '--logging_dir=/Users/aniketsharma/Documents/Sharma/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=260', '--train_batch_size=1', '--max_train_steps=2600', '--save_every_n_epochs=1', '--mixed_precision=no', '--save_precision=float', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--mem_eff_attn', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1. ^CKeyboard interruption in main thread... closing server.

andupotorac commented 1 year ago

Follow this (translate to english): https://planaria.page/blog/?p=671

Might need to use this at some point conda install -n base conda=23.1.0

And also to activate before calling accelerate config. . ./venv/bin/activate

xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

Aniket22156 commented 1 year ago

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found ./default_config.yaml: line 2: commands:: command not found ./default_config.yaml: line 3: compute_environment:: command not found ./default_config.yaml: line 4: deepspeed_config:: command not found ./default_config.yaml: line 5: distributed_type:: command not found ./default_config.yaml: line 6: downcast_bf16:: command not found ./default_config.yaml: line 7: dynamo_backend:: command not found ./default_config.yaml: line 8: fsdp_config:: command not found ./default_config.yaml: line 9: gpu_ids:: command not found ./default_config.yaml: line 10: machine_rank:: command not found ./default_config.yaml: line 11: main_process_ip:: command not found ./default_config.yaml: line 12: main_process_port:: command not found ./default_config.yaml: line 13: main_training_function:: command not found ./default_config.yaml: line 14: megatron_lm_config:: command not found ./default_config.yaml: line 15: mixed_precision:: command not found ./default_config.yaml: line 16: num_machines:: command not found ./default_config.yaml: line 17: num_processes:: command not found ./default_config.yaml: line 18: rdzv_backend:: command not found ./default_config.yaml: line 19: same_network:: command not found ./default_config.yaml: line 20: tpu_name:: command not found ./default_config.yaml: line 21: tpu_zone:: command not found ./default_config.yaml: line 22: use_cpu:: command not found

Aniket22156 commented 1 year ago

Follow this (translate to english): https://planaria.page/blog/?p=671

Might need to use this at some point conda install -n base conda=23.1.0

And also to activate before calling accelerate config. . ./venv/bin/activate

xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found ./default_config.yaml: line 2: commands:: command not found ./default_config.yaml: line 3: compute_environment:: command not found ./default_config.yaml: line 4: deepspeed_config:: command not found ./default_config.yaml: line 5: distributed_type:: command not found ./default_config.yaml: line 6: downcast_bf16:: command not found ./default_config.yaml: line 7: dynamo_backend:: command not found ./default_config.yaml: line 8: fsdp_config:: command not found ./default_config.yaml: line 9: gpu_ids:: command not found ./default_config.yaml: line 10: machine_rank:: command not found ./default_config.yaml: line 11: main_process_ip:: command not found ./default_config.yaml: line 12: main_process_port:: command not found ./default_config.yaml: line 13: main_training_function:: command not found ./default_config.yaml: line 14: megatron_lm_config:: command not found ./default_config.yaml: line 15: mixed_precision:: command not found ./default_config.yaml: line 16: num_machines:: command not found ./default_config.yaml: line 17: num_processes:: command not found ./default_config.yaml: line 18: rdzv_backend:: command not found ./default_config.yaml: line 19: same_network:: command not found ./default_config.yaml: line 20: tpu_name:: command not found ./default_config.yaml: line 21: tpu_zone:: command not found ./default_config.yaml: line 22: use_cpu:: command not found

andupotorac commented 1 year ago

Follow this (translate to english): https://planaria.page/blog/?p=671 Might need to use this at some point conda install -n base conda=23.1.0 And also to activate before calling accelerate config. . ./venv/bin/activate xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found ./default_config.yaml: line 2: commands:: command not found ./default_config.yaml: line 3: compute_environment:: command not found ./default_config.yaml: line 4: deepspeed_config:: command not found ./default_config.yaml: line 5: distributed_type:: command not found ./default_config.yaml: line 6: downcast_bf16:: command not found ./default_config.yaml: line 7: dynamo_backend:: command not found ./default_config.yaml: line 8: fsdp_config:: command not found ./default_config.yaml: line 9: gpu_ids:: command not found ./default_config.yaml: line 10: machine_rank:: command not found ./default_config.yaml: line 11: main_process_ip:: command not found ./default_config.yaml: line 12: main_process_port:: command not found ./default_config.yaml: line 13: main_training_function:: command not found ./default_config.yaml: line 14: megatron_lm_config:: command not found ./default_config.yaml: line 15: mixed_precision:: command not found ./default_config.yaml: line 16: num_machines:: command not found ./default_config.yaml: line 17: num_processes:: command not found ./default_config.yaml: line 18: rdzv_backend:: command not found ./default_config.yaml: line 19: same_network:: command not found ./default_config.yaml: line 20: tpu_name:: command not found ./default_config.yaml: line 21: tpu_zone:: command not found ./default_config.yaml: line 22: use_cpu:: command not found

Did you activate your environment first from within Kohya? . ./venv/bin/activate

Aniket22156 commented 1 year ago

Follow this (translate to english): https://planaria.page/blog/?p=671 Might need to use this at some point conda install -n base conda=23.1.0 And also to activate before calling accelerate config. . ./venv/bin/activate xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found ./default_config.yaml: line 2: commands:: command not found ./default_config.yaml: line 3: compute_environment:: command not found ./default_config.yaml: line 4: deepspeed_config:: command not found ./default_config.yaml: line 5: distributed_type:: command not found ./default_config.yaml: line 6: downcast_bf16:: command not found ./default_config.yaml: line 7: dynamo_backend:: command not found ./default_config.yaml: line 8: fsdp_config:: command not found ./default_config.yaml: line 9: gpu_ids:: command not found ./default_config.yaml: line 10: machine_rank:: command not found ./default_config.yaml: line 11: main_process_ip:: command not found ./default_config.yaml: line 12: main_process_port:: command not found ./default_config.yaml: line 13: main_training_function:: command not found ./default_config.yaml: line 14: megatron_lm_config:: command not found ./default_config.yaml: line 15: mixed_precision:: command not found ./default_config.yaml: line 16: num_machines:: command not found ./default_config.yaml: line 17: num_processes:: command not found ./default_config.yaml: line 18: rdzv_backend:: command not found ./default_config.yaml: line 19: same_network:: command not found ./default_config.yaml: line 20: tpu_name:: command not found ./default_config.yaml: line 21: tpu_zone:: command not found ./default_config.yaml: line 22: use_cpu:: command not found

Did you activate your environment first from within Kohya? . ./venv/bin/activate

yes

andupotorac commented 1 year ago

Feel free to overwrite yours (be sure it's the one in cache, from huggingspace) where it's saved: https://rentry.org/85cps

justostoll commented 1 year ago

I am also getting this issue on a Mac M1 when I start training, even though I have not selected options for a GPU in the settings:

00:16:19-704016 INFO accelerate launch --num_cpu_threads_per_process=8
"./train_network.py" --enable_bucket
--min_bucket_reso=256 --max_bucket_reso=2048
--pretrained_model_name_or_path="runwayml/stable-diffus ion-v1-5"
--train_data_dir="/Volumes/EXT04005/Miscellaneous/Video s/TED/Misc/Others/Training"
--resolution="512,512"
--output_dir="/Users/user/stable-diffusion-webui/ embeddings"
--logging_dir="/Volumes/EXT04005/Miscellaneous/Videos/T ED/Misc/Others/Training/100_Training
/logs" --network_alpha="128"
--save_model_as=safetensors
--network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128
--output_name="Output_v1"
--lr_scheduler_num_cycles="1" --no_half_vae
--learning_rate="0.0001" --lr_scheduler="constant"
--train_batch_size="2" --max_train_steps="500"
--save_every_n_epochs="1" --mixed_precision="no"
--save_precision="float" --seed="1234"
--caption_extension=".txt" --cache_latents
--optimizer_type="AdamW8bit"
--max_data_loader_n_workers="1" --bucket_reso_steps=64 --bucket_no_upscale --noise_offset=0.0

╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /Users/User/kohya_ss/./train_network.py:990 in │ │ │ │ 987 │ args = train_util.read_config_from_file(args, parser) │ │ 988 │ │ │ 989 │ trainer = NetworkTrainer() │ │ ❱ 990 │ trainer.train(args) │ │ 991 │ │ │ │ /Users/User/kohya_ss/./train_network.py:803 in train │ │ │ │ 800 │ │ │ │ │ │ params_to_clip = network.get_trainable_params( │ │ 801 │ │ │ │ │ │ accelerator.clip_gradnorm(params_to_clip, ar │ │ 802 │ │ │ │ │ │ │ ❱ 803 │ │ │ │ │ optimizer.step() │ │ 804 │ │ │ │ │ lr_scheduler.step() │ │ 805 │ │ │ │ │ optimizer.zero_grad(set_to_none=True) │ │ 806 │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/opti │ │ mizer.py:140 in step │ │ │ │ 137 │ │ │ │ # If we reduced the loss scale, it means the optimizer │ │ 138 │ │ │ │ self._is_overflow = scale_after < scale_before │ │ 139 │ │ │ else: │ │ ❱ 140 │ │ │ │ self.optimizer.step(closure) │ │ 141 │ │ │ 142 │ def _switch_parameters(self, parameters_map): │ │ 143 │ │ for param_group in self.optimizer.param_groups: │ │ │ │ /Users/User/kohyass/venv/lib/python3.10/site-packages/torch/optim/lr │ │ scheduler.py:69 in wrapper │ │ │ │ 66 │ │ │ │ instance = instance_ref() │ │ 67 │ │ │ │ instance._step_count += 1 │ │ 68 │ │ │ │ wrapped = func.get(instance, cls) │ │ ❱ 69 │ │ │ │ return wrapped(*args, kwargs) │ │ 70 │ │ │ │ │ 71 │ │ │ # Note that the returned function here is no longer a bou │ │ 72 │ │ │ # so attributes like __func__ and __self__ no longer │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/opt │ │ imizer.py:280 in wrapper │ │ │ │ 277 │ │ │ │ │ │ │ raise RuntimeError(f"{func} must return No │ │ 278 │ │ │ │ │ │ │ │ │ │ │ f"but got {result}.") │ │ 279 │ │ │ │ │ │ ❱ 280 │ │ │ │ out = func(*args, *kwargs) │ │ 281 │ │ │ │ self._optimizer_step_code() │ │ 282 │ │ │ │ │ │ 283 │ │ │ │ # call optimizer step post hooks │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_co │ │ ntextlib.py:115 in decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(args, kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(*args, kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/op │ │ tim/optimizer.py:269 in step │ │ │ │ 266 │ │ │ │ │ self.init_state(group, p, gindex, pindex) │ │ 267 │ │ │ │ │ │ 268 │ │ │ │ self.prefetch_state(p) │ │ ❱ 269 │ │ │ │ self.update_step(group, p, gindex, pindex) │ │ 270 │ │ │ │ torch.cuda.synchronize() │ │ 271 │ │ if self.is_paged: │ │ 272 │ │ │ # all paged operation are asynchronous, we need │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_co │ │ ntextlib.py:115 in decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(*args, *kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(args, kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/op │ │ tim/optimizer.py:517 in update_step │ │ │ │ 514 │ │ │ state["max1"], state["new_max1"] = state["new_max1"], stat │ │ 515 │ │ │ state["max2"], state["new_max2"] = state["new_max2"], stat │ │ 516 │ │ elif state["state1"].dtype == torch.uint8 and config["block_wi │ │ ❱ 517 │ │ │ F.optimizer_update_8bit_blockwise( │ │ 518 │ │ │ │ self.optimizer_name, │ │ 519 │ │ │ │ grad, │ │ 520 │ │ │ │ p, │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/fu │ │ nctional.py:1278 in optimizer_update_8bit_blockwise │ │ │ │ 1275 ) -> None: │ │ 1276 │ │ │ 1277 │ optim_func = None │ │ ❱ 1278 │ prev_device = pre_call(g.device) │ │ 1279 │ is_on_gpu([g, p, state1, state2, qmap1, qmap2, absmax1, absmax2]) │ │ 1280 │ if g.dtype == torch.float32 and state1.dtype == torch.uint8: │ │ 1281 │ │ optim_func = str2optimizer8bit_blockwise[optimizer_name][0] │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/fu │ │ nctional.py:415 in pre_call │ │ │ │ 412 │ │ 413 │ │ 414 def pre_call(device): │ │ ❱ 415 │ prev_device = torch.cuda.current_device() │ │ 416 │ torch.cuda.set_device(device) │ │ 417 │ return prev_device │ │ 418 │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/in │ │ it.py:674 in current_device │ │ │ │ 671 │ │ 672 def current_device() -> int: │ │ 673 │ r"""Returns the index of a currently selected device.""" │ │ ❱ 674 │ _lazy_init() │ │ 675 │ return torch._C._cuda_getDevice() │ │ 676 │ │ 677 │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/in │ │ it.py:239 in _lazy_init │ │ │ │ 236 │ │ │ │ "Cannot re-initialize CUDA in forked subprocess. To u │ │ 237 │ │ │ │ "multiprocessing, you must use the 'spawn' start meth │ │ 238 │ │ if not hasattr(torch._C, '_cuda_getDeviceCount'): │ │ ❱ 239 │ │ │ raise AssertionError("Torch not compiled with CUDA enable │ │ 240 │ │ if _cudart is None: │ │ 241 │ │ │ raise AssertionError( │ │ 242 │ │ │ │ "libcudart functions unavailable. It looks like you h │ ╰──────────────────────────────────────────────────────────────────────────────╯ AssertionError: Torch not compiled with CUDA enabled steps: 0%| | 0/500 [00:10<?, ?it/s] ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /Users/User/kohya_ss/venv/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/comm │ │ ands/accelerate_cli.py:45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/comm │ │ ands/launch.py:918 in launch_command │ │ │ │ 915 │ elif defaults is not None and defaults.compute_environment == Comp │ │ 916 │ │ sagemaker_launcher(defaults, args) │ │ 917 │ else: │ │ ❱ 918 │ │ simple_launcher(args) │ │ 919 │ │ 920 │ │ 921 def main(): │ │ │ │ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/comm │ │ ands/launch.py:580 in simple_launcher │ │ │ │ 577 │ process.wait() │ │ 578 │ if process.returncode != 0: │ │ 579 │ │ if not args.quiet: │ │ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.ret │ │ 581 │ │ else: │ │ 582 │ │ │ sys.exit(1) │ │ 583 │ ╰──────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/Users/User/kohya_ss/venv/bin/python', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/Volumes/EXT04005/Miscellaneous/Videos/TED/Misc/Others/Training', '--resolution=512,512', '--output_dir=/Users/User/stable-diffusion-webui/embeddings', '--logging_dir=/Volumes/EXT04005/Miscellaneous/Videos/TED/Misc/Others/Training/100_Training/logs', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=128', '--output_name=Output_v1', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=500', '--save_every_n_epochs=1', '--mixed_precision=no', '--save_precision=float', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=1', '--bucket_reso_steps=64', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

bneigher commented 10 months ago

same

ourcolour commented 6 months ago

+1

ourcolour commented 6 months ago

I've tried this way on My Macbook Pro M2:

Mixed Precision: no Save Precision: float Optimizer: AdamW Advanced Configuraion: UN-check xformers