Closed lambertlu closed 1 year ago
Got this error message today: OutOfMemoryError: CUDA out of memory. on 5.5: Start Training.
It was fine yesterday. resolution: (768, 768)
Loading settings from /content/LoRA/config/config_file.toml... /content/LoRA/config/config_file prepare tokenizer Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 1.14MB/s] Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 758kB/s] Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 84.3kB/s] Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 225kB/s] update token length: 225 Load dataset config from /content/LoRA/config/dataset_config.toml prepare images. found directory /content/LoRA/train_data contains 24 image files 240 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 6 resolution: (768, 768) enable_bucket: True min_bucket_reso: 320 max_bucket_reso: 1280 bucket_reso_steps: 64 bucket_no_upscale: False
[Subset 0 of Dataset 0] image_dir: "/content/LoRA/train_data" image_count: 24 num_repeats: 10 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: mksks style caption_extension: .txt
[Dataset 0] loading image sizes. 100% 24/24 [00:00<00:00, 2216.52it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (768, 768), count: 240 mean ar error (without repeats): 0.0 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load StableDiffusion checkpoint loading u-net: loading vae: Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 1.32MB/s] Downloading pytorch_model.bin: 100% 1.71G/1.71G [00:51<00:00, 33.1MB/s] loading text encoder: Replace CrossAttention.forward to use xformers [Dataset 0] caching latents. 100% 6/6 [00:12<00:00, 2.00s/it] import network module: networks.lora create LoRA network. base dim (rank): 32, alpha: 16 create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so... use 8-bit AdamW optimizer | {} override steps. steps for 20 epochs is / 指定エポックまでのステップ数: 800 running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 240 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 40 num epochs / epoch数: 20 batch size per device / バッチサイズ: 6 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 800 steps: 0% 0/800 [00:00<?, ?it/s]epoch 1/20 ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /content/kohya-trainer/train_network.py:752 in │ │ │ │ 749 │ args = parser.parse_args() │ │ 750 │ args = train_util.read_config_from_file(args, parser) │ │ 751 │ │ │ ❱ 752 │ train(args) │ │ 753 │ │ │ │ /content/kohya-trainer/train_network.py:583 in train │ │ │ │ 580 │ │ │ │ │ │ 581 │ │ │ │ # Predict the noise residual │ │ 582 │ │ │ │ with accelerator.autocast(): │ │ ❱ 583 │ │ │ │ │ noise_pred = unet(noisy_latents, timesteps, encode │ │ 584 │ │ │ │ │ │ 585 │ │ │ │ if args.v_parameterization: │ │ 586 │ │ │ │ │ # v-parameterization training │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(args, kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py:490 in │ │ call │ │ │ │ 487 │ │ update_wrapper(self, model_forward) │ │ 488 │ │ │ 489 │ def call(self, *args, *kwargs): │ │ ❱ 490 │ │ return convert_to_fp32(self.model_forward(args, kwargs)) │ │ 491 │ │ │ 492 │ def getstate(self): │ │ 493 │ │ raise pickle.PicklingError( │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py:14 in │ │ decorate_autocast │ │ │ │ 11 │ @functools.wraps(func) │ │ 12 │ def decorate_autocast(*args, kwargs): │ │ 13 │ │ with autocast_instance: │ │ ❱ 14 │ │ │ return func(*args, *kwargs) │ │ 15 │ decorate_autocast.__script_unsupported = '@autocast() decorator is │ │ 16 │ return decorate_autocast │ │ 17 │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py │ │ :407 in forward │ │ │ │ 404 │ │ │ │ upsample_size = down_block_res_samples[-1].shape[2:] │ │ 405 │ │ │ │ │ 406 │ │ │ if hasattr(upsample_block, "has_cross_attention") and upsa │ │ ❱ 407 │ │ │ │ sample = upsample_block( │ │ 408 │ │ │ │ │ hidden_states=sample, │ │ 409 │ │ │ │ │ temb=emb, │ │ 410 │ │ │ │ │ res_hidden_states_tuple=res_samples, │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(args, kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py:12 │ │ 02 in forward │ │ │ │ 1199 │ │ │ │ │ create_custom_forward(attn, return_dict=False), h │ │ 1200 │ │ │ │ )[0] │ │ 1201 │ │ │ else: │ │ ❱ 1202 │ │ │ │ hidden_states = resnet(hidden_states, temb) │ │ 1203 │ │ │ │ hidden_states = attn(hidden_states, encoder_hidden_st │ │ 1204 │ │ │ │ 1205 │ │ if self.upsamplers is not None: │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(*args, *kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /usr/local/lib/python3.9/dist-packages/diffusers/models/resnet.py:450 in │ │ forward │ │ │ │ 447 │ def forward(self, input_tensor, temb): │ │ 448 │ │ hidden_states = input_tensor │ │ 449 │ │ │ │ ❱ 450 │ │ hidden_states = self.norm1(hidden_states) │ │ 451 │ │ hidden_states = self.nonlinearity(hidden_states) │ │ 452 │ │ │ │ 453 │ │ if self.upsample is not None: │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │ │ ❱ 1501 │ │ │ return forward_call(args, *kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_prehooks = [] │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/normalization.py:273 │ │ in forward │ │ │ │ 270 │ │ │ init.zeros(self.bias) │ │ 271 │ │ │ 272 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 273 │ │ return F.group_norm( │ │ 274 │ │ │ input, self.num_groups, self.weight, self.bias, self.eps) │ │ 275 │ │ │ 276 │ def extra_repr(self) -> str: │ │ │ │ /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:2530 in │ │ group_norm │ │ │ │ 2527 │ if input.dim() < 2: │ │ 2528 │ │ raise RuntimeError(f"Expected at least 2 dimensions for input │ │ 2529 │ _verify_batch_size([input.size(0) input.size(1) // num_groups, │ │ ❱ 2530 │ return torch.group_norm(input, num_groups, weight, bias, eps, tor │ │ 2531 │ │ 2532 │ │ 2533 def local_response_norm(input: Tensor, size: int, alpha: float = 1e-4 │ ╰──────────────────────────────────────────────────────────────────────────────╯ OutOfMemoryError: CUDA out of memory. Tried to allocate 204.00 MiB (GPU 0; 14.75 GiB total capacity; 13.09 GiB already allocated; 30.81 MiB free; 13.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0% 0/800 [00:02<?, ?it/s] ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /usr/local/bin/accelerate:8 in │ │ │ │ 5 from accelerate.commands.accelerate_cli import main │ │ 6 if name == 'main': │ │ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py │ │ :45 in main │ │ │ │ 42 │ │ exit(1) │ │ 43 │ │ │ 44 │ # Run │ │ ❱ 45 │ args.func(args) │ │ 46 │ │ 47 │ │ 48 if name == "main": │ │ │ │ /usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py:1104 in │ │ launch_command │ │ │ │ 1101 │ elif defaults is not None and defaults.compute_environment == Com │ │ 1102 │ │ sagemaker_launcher(defaults, args) │ │ 1103 │ else: │ │ ❱ 1104 │ │ simple_launcher(args) │ │ 1105 │ │ 1106 │ │ 1107 def main(): │ │ │ │ /usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py:567 in │ │ simple_launcher │ │ │ │ 564 │ process = subprocess.Popen(cmd, env=current_env) │ │ 565 │ process.wait() │ │ 566 │ if process.returncode != 0: │ │ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.return │ │ 568 │ │ 569 │ │ 570 def multi_gpu_launcher(args): │ ╰──────────────────────────────────────────────────────────────────────────────╯ CalledProcessError: Command '['/usr/bin/python3', 'train_network.py', '--sample_prompts=/content/LoRA/config/sample_prompt.txt', '--dataset_config=/content/LoRA/config/dataset_config.toml', '--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1.
reduce you batch size down to 1 then work your way up
Reduce your batch size to either 4 or 5
Thanks for the reply. I turn it to 3 and it works now. Have a nice day!
Got this error message today: OutOfMemoryError: CUDA out of memory. on 5.5: Start Training.
It was fine yesterday. resolution: (768, 768)
Loading settings from /content/LoRA/config/config_file.toml... /content/LoRA/config/config_file prepare tokenizer Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 1.14MB/s] Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 758kB/s] Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 84.3kB/s] Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 225kB/s] update token length: 225 Load dataset config from /content/LoRA/config/dataset_config.toml prepare images. found directory /content/LoRA/train_data contains 24 image files 240 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 6 resolution: (768, 768) enable_bucket: True min_bucket_reso: 320 max_bucket_reso: 1280 bucket_reso_steps: 64 bucket_no_upscale: False
[Subset 0 of Dataset 0] image_dir: "/content/LoRA/train_data" image_count: 24 num_repeats: 10 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0 color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: mksks style caption_extension: .txt
[Dataset 0] loading image sizes. 100% 24/24 [00:00<00:00, 2216.52it/s] make buckets number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む) bucket 0: resolution (768, 768), count: 240 mean ar error (without repeats): 0.0 prepare accelerator Using accelerator 0.15.0 or above. loading model for process 0/1 load StableDiffusion checkpoint loading u-net:
loading vae:
Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 1.32MB/s]
Downloading pytorch_model.bin: 100% 1.71G/1.71G [00:51<00:00, 33.1MB/s]
loading text encoder:
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100% 6/6 [00:12<00:00, 2.00s/it]
import network module: networks.lora
create LoRA network. base dim (rank): 32, alpha: 16
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
use 8-bit AdamW optimizer | {}
override steps. steps for 20 epochs is / 指定エポックまでのステップ数: 800
running training / 学習開始
num train images repeats / 学習画像の数×繰り返し回数: 240
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 40
num epochs / epoch数: 20
batch size per device / バッチサイズ: 6
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 800
steps: 0% 0/800 [00:00<?, ?it/s]epoch 1/20
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/kohya-trainer/train_network.py:752 in │
│ │
│ 749 │ args = parser.parse_args() │
│ 750 │ args = train_util.read_config_from_file(args, parser) │
│ 751 │ │
│ ❱ 752 │ train(args) │
│ 753 │
│ │
│ /content/kohya-trainer/train_network.py:583 in train │
│ │
│ 580 │ │ │ │ │
│ 581 │ │ │ │ # Predict the noise residual │
│ 582 │ │ │ │ with accelerator.autocast(): │
│ ❱ 583 │ │ │ │ │ noise_pred = unet(noisy_latents, timesteps, encode │
│ 584 │ │ │ │ │
│ 585 │ │ │ │ if args.v_parameterization: │
│ 586 │ │ │ │ │ # v-parameterization training │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call( args, kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/utils/operations.py:490 in │
│ call │
│ │
│ 487 │ │ update_wrapper(self, model_forward) │
│ 488 │ │
│ 489 │ def call(self, *args, *kwargs): │
│ ❱ 490 │ │ return convert_to_fp32(self.model_forward(args, kwargs)) │
│ 491 │ │
│ 492 │ def getstate(self): │
│ 493 │ │ raise pickle.PicklingError( │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/amp/autocast_mode.py:14 in │
│ decorate_autocast │
│ │
│ 11 │ @functools.wraps(func) │
│ 12 │ def decorate_autocast(*args, kwargs): │
│ 13 │ │ with autocast_instance: │
│ ❱ 14 │ │ │ return func(*args, *kwargs) │
│ 15 │ decorate_autocast.__script_unsupported = '@autocast() decorator is │
│ 16 │ return decorate_autocast │
│ 17 │
│ │
│ /usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_condition.py │
│ :407 in forward │
│ │
│ 404 │ │ │ │ upsample_size = down_block_res_samples[-1].shape[2:] │
│ 405 │ │ │ │
│ 406 │ │ │ if hasattr(upsample_block, "has_cross_attention") and upsa │
│ ❱ 407 │ │ │ │ sample = upsample_block( │
│ 408 │ │ │ │ │ hidden_states=sample, │
│ 409 │ │ │ │ │ temb=emb, │
│ 410 │ │ │ │ │ res_hidden_states_tuple=res_samples, │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(args, kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /usr/local/lib/python3.9/dist-packages/diffusers/models/unet_2d_blocks.py:12 │
│ 02 in forward │
│ │
│ 1199 │ │ │ │ │ create_custom_forward(attn, return_dict=False), h │
│ 1200 │ │ │ │ )[0] │
│ 1201 │ │ │ else: │
│ ❱ 1202 │ │ │ │ hidden_states = resnet(hidden_states, temb) │
│ 1203 │ │ │ │ hidden_states = attn(hidden_states, encoder_hidden_st │
│ 1204 │ │ │
│ 1205 │ │ if self.upsamplers is not None: │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(*args, *kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /usr/local/lib/python3.9/dist-packages/diffusers/models/resnet.py:450 in │
│ forward │
│ │
│ 447 │ def forward(self, input_tensor, temb): │
│ 448 │ │ hidden_states = input_tensor │
│ 449 │ │ │
│ ❱ 450 │ │ hidden_states = self.norm1(hidden_states) │
│ 451 │ │ hidden_states = self.nonlinearity(hidden_states) │
│ 452 │ │ │
│ 453 │ │ if self.upsample is not None: │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(args, *kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_prehooks = [] │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/normalization.py:273 │
│ in forward │
│ │
│ 270 │ │ │ init.zeros(self.bias) │
│ 271 │ │
│ 272 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 273 │ │ return F.group_norm( │
│ 274 │ │ │ input, self.num_groups, self.weight, self.bias, self.eps) │
│ 275 │ │
│ 276 │ def extra_repr(self) -> str: │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:2530 in │
│ group_norm │
│ │
│ 2527 │ if input.dim() < 2: │
│ 2528 │ │ raise RuntimeError(f"Expected at least 2 dimensions for input │
│ 2529 │ _verify_batch_size([input.size(0) input.size(1) // num_groups, │
│ ❱ 2530 │ return torch.group_norm(input, num_groups, weight, bias, eps, tor │
│ 2531 │
│ 2532 │
│ 2533 def local_response_norm(input: Tensor, size: int, alpha: float = 1e-4 │
╰──────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 204.00 MiB (GPU 0; 14.75
GiB total capacity; 13.09 GiB already allocated; 30.81 MiB free; 13.44 GiB
reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation. See documentation for Memory
Management and PYTORCH_CUDA_ALLOC_CONF
steps: 0% 0/800 [00:02<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/bin/accelerate:8 in │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py │
│ :45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py:1104 in │
│ launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == Com │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py:567 in │
│ simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.return │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'train_network.py',
'--sample_prompts=/content/LoRA/config/sample_prompt.txt',
'--dataset_config=/content/LoRA/config/dataset_config.toml',
'--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit
status 1.