Hi. I'm trying to run the Dreambooth LoRA Training on a Kaggle notebook, with GPU P100 Accelerator.
I'm getting this error while executing the 5.5. Start Training code cell.
I tried changing XFormers version (0.0.16, 0.0.17, 0.0.18, 0.0.19...) but anything solves the issue.
Currently Torch 2 is being used. I don't know if there's a way to change it without breaking everything, so I haven't tested with torch 1.3.
This is the result of doing !python -m xformers.info:
xFormers 0.0.18
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.flshattF: unavailable
memory_efficient_attention.flshattB: unavailable
memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable
memory_efficient_attention.tritonflashattF: available
memory_efficient_attention.tritonflashattB: available
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built
is_triton_available: True
is_functorch_available: False
pytorch.version: 2.0.0
pytorch.cuda: available
gpu.compute_capability: 6.0
gpu.name: Tesla P100-PCIE-16GB
build.info: available
build.cuda_version: 1108
build.python_version: 3.10.10
build.torch_version: 2.0.0+cu118
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.18
source.privacy: open source
The actual error:
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
warnings.warn(f"file system plugins are not loaded: {e}")
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']
warnings.warn(f"file system plugins are not loaded: {e}")
Loading settings from /kaggle/working/LoRA/config/config_file.toml...
/kaggle/working/LoRA/config/config_file
prepare tokenizer
Downloading (…)olve/main/vocab.json: 100%|███| 961k/961k [00:00<00:00, 2.96MB/s]
Downloading (…)olve/main/merges.txt: 100%|███| 525k/525k [00:00<00:00, 2.14MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████| 389/389 [00:00<00:00, 332kB/s]
Downloading (…)okenizer_config.json: 100%|██████| 905/905 [00:00<00:00, 778kB/s]
update token length: 225
Load dataset config from /kaggle/working/LoRA/config/dataset_config.toml
prepare images.
found directory /kaggle/input/data-img/upscaled_v2_prepr contains 92 image files
920 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False
[Subset 0 of Dataset 0]
image_dir: "/kaggle/input/data-img/upscaled_v2_prepr"
image_count: 92
num_repeats: 10
shuffle_caption: True
keep_tokens: 0
caption_dropout_rate: 0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: mksks
caption_extension: .txt
[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████| 92/92 [00:00<00:00, 118.18it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 920
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
Downloading (…)lve/main/config.json: 100%|█| 4.52k/4.52k [00:00<00:00, 3.35MB/s]
Downloading pytorch_model.bin: 100%|███████| 1.71G/1.71G [00:30<00:00, 55.3MB/s]
loading text encoder: <All keys matched successfully>
load VAE: /kaggle/working/vae/stablediffusion.vae.pt
additional VAE loaded
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|███████████████████████████████████████████| 23/23 [00:18<00:00, 1.22it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 32, alpha: 16
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
================================================================================
CUDA SETUP: CUDA runtime path found: /opt/conda/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.0
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so...
use 8-bit AdamW optimizer | {}
override steps. steps for 5 epochs is / 指定エポックまでのステップ数: 4600
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 920
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 920
num epochs / epoch数: 5
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 4600
steps: 0%| | 0/4600 [00:00<?, ?it/s]epoch 1/5
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /kaggle/working/kohya-trainer/train_network.py:752 in <module> │
│ │
│ 749 │ args = parser.parse_args() │
│ 750 │ args = train_util.read_config_from_file(args, parser) │
│ 751 │ │
│ ❱ 752 │ train(args) │
│ 753 │
│ │
│ /kaggle/working/kohya-trainer/train_network.py:583 in train │
│ │
│ 580 │ │ │ │ │
│ 581 │ │ │ │ # Predict the noise residual │
│ 582 │ │ │ │ with accelerator.autocast(): │
│ ❱ 583 │ │ │ │ │ noise_pred = unet(noisy_latents, timesteps, encode │
│ 584 │ │ │ │ │
│ 585 │ │ │ │ if args.v_parameterization: │
│ 586 │ │ │ │ │ # v-parameterization training │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /opt/conda/lib/python3.10/site-packages/accelerate/utils/operations.py:490 │
│ in __call__ │
│ │
│ 487 │ │ update_wrapper(self, model_forward) │
│ 488 │ │
│ 489 │ def __call__(self, *args, **kwargs): │
│ ❱ 490 │ │ return convert_to_fp32(self.model_forward(*args, **kwargs)) │
│ 491 │ │
│ 492 │ def __getstate__(self): │
│ 493 │ │ raise pickle.PicklingError( │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/amp/autocast_mode.py:14 in │
│ decorate_autocast │
│ │
│ 11 │ @functools.wraps(func) │
│ 12 │ def decorate_autocast(*args, **kwargs): │
│ 13 │ │ with autocast_instance: │
│ ❱ 14 │ │ │ return func(*args, **kwargs) │
│ 15 │ decorate_autocast.__script_unsupported = '@autocast() decorator is │
│ 16 │ return decorate_autocast │
│ 17 │
│ │
│ /opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.p │
│ y:381 in forward │
│ │
│ 378 │ │ down_block_res_samples = (sample,) │
│ 379 │ │ for downsample_block in self.down_blocks: │
│ 380 │ │ │ if hasattr(downsample_block, "has_cross_attention") and do │
│ ❱ 381 │ │ │ │ sample, res_samples = downsample_block( │
│ 382 │ │ │ │ │ hidden_states=sample, │
│ 383 │ │ │ │ │ temb=emb, │
│ 384 │ │ │ │ │ encoder_hidden_states=encoder_hidden_states, │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:6 │
│ 07 in forward │
│ │
│ 604 │ │ │ │ │ return custom_forward │
│ 605 │ │ │ │ │
│ 606 │ │ │ │ hidden_states = torch.utils.checkpoint.checkpoint(cre │
│ ❱ 607 │ │ │ │ hidden_states = torch.utils.checkpoint.checkpoint( │
│ 608 │ │ │ │ │ create_custom_forward(attn, return_dict=False), h │
│ 609 │ │ │ │ )[0] │
│ 610 │ │ │ else: │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:249 in │
│ checkpoint │
│ │
│ 246 │ │ raise ValueError("Unexpected keyword arguments: " + ",".join(a │
│ 247 │ │
│ 248 │ if use_reentrant: │
│ ❱ 249 │ │ return CheckpointFunction.apply(function, preserve, *args) │
│ 250 │ else: │
│ 251 │ │ return _checkpoint_without_reentrant( │
│ 252 │ │ │ function, │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py:506 in │
│ apply │
│ │
│ 503 │ │ if not torch._C._are_functorch_transforms_active(): │
│ 504 │ │ │ # See NOTE: [functorch vjp and autograd interaction] │
│ 505 │ │ │ args = _functorch.utils.unwrap_dead_wrappers(args) │
│ ❱ 506 │ │ │ return super().apply(*args, **kwargs) # type: ignore[misc │
│ 507 │ │ │
│ 508 │ │ if cls.setup_context == _SingleLevelFunction.setup_context: │
│ 509 │ │ │ raise RuntimeError( │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:107 in │
│ forward │
│ │
│ 104 │ │ ctx.save_for_backward(*tensor_inputs) │
│ 105 │ │ │
│ 106 │ │ with torch.no_grad(): │
│ ❱ 107 │ │ │ outputs = run_function(*args) │
│ 108 │ │ return outputs │
│ 109 │ │
│ 110 │ @staticmethod │
│ │
│ /opt/conda/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py:6 │
│ 00 in custom_forward │
│ │
│ 597 │ │ │ │ def create_custom_forward(module, return_dict=None): │
│ 598 │ │ │ │ │ def custom_forward(*inputs): │
│ 599 │ │ │ │ │ │ if return_dict is not None: │
│ ❱ 600 │ │ │ │ │ │ │ return module(*inputs, return_dict=return │
│ 601 │ │ │ │ │ │ else: │
│ 602 │ │ │ │ │ │ │ return module(*inputs) │
│ 603 │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /opt/conda/lib/python3.10/site-packages/diffusers/models/attention.py:216 in │
│ forward │
│ │
│ 213 │ │ │
│ 214 │ │ # 2. Blocks │
│ 215 │ │ for block in self.transformer_blocks: │
│ ❱ 216 │ │ │ hidden_states = block(hidden_states, context=encoder_hidde │
│ 217 │ │ │
│ 218 │ │ # 3. Output │
│ 219 │ │ if self.is_input_continuous: │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /opt/conda/lib/python3.10/site-packages/diffusers/models/attention.py:484 in │
│ forward │
│ │
│ 481 │ │ if self.only_cross_attention: │
│ 482 │ │ │ hidden_states = self.attn1(norm_hidden_states, context) + │
│ 483 │ │ else: │
│ ❱ 484 │ │ │ hidden_states = self.attn1(norm_hidden_states) + hidden_st │
│ 485 │ │ │
│ 486 │ │ if self.attn2 is not None: │
│ 487 │ │ │ # 2. Cross-Attention │
│ │
│ /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501 in │
│ _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or s │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hoo │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /kaggle/working/kohya-trainer/library/train_util.py:1792 in forward_xformers │
│ │
│ 1789 │ │ q = q.contiguous() │
│ 1790 │ │ k = k.contiguous() │
│ 1791 │ │ v = v.contiguous() │
│ ❱ 1792 │ │ out = xformers.ops.memory_efficient_attention(q, k, v, attn_b │
│ 1793 │ │ │
│ 1794 │ │ out = rearrange(out, "b n h d -> b n (h d)", h=h) │
│ 1795 │
│ │
│ /opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:196 in │
│ memory_efficient_attention │
│ │
│ 193 │ │ and options. │
│ 194 │ :return: multi-head attention Tensor with shape ``[B, Mq, H, Kv]`` │
│ 195 │ """ │
│ ❱ 196 │ return _memory_efficient_attention( │
│ 197 │ │ Inputs( │
│ 198 │ │ │ query=query, key=key, value=value, p=p, attn_bias=attn_bia │
│ 199 │ │ ), │
│ │
│ /opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:294 in │
│ _memory_efficient_attention │
│ │
│ 291 ) -> torch.Tensor: │
│ 292 │ # fast-path that doesn't require computing the logsumexp for backw │
│ 293 │ if all(x.requires_grad is False for x in [inp.query, inp.key, inp. │
│ ❱ 294 │ │ return _memory_efficient_attention_forward( │
│ 295 │ │ │ inp, op=op[0] if op is not None else None │
│ 296 │ │ ) │
│ 297 │
│ │
│ /opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py:310 in │
│ _memory_efficient_attention_forward │
│ │
│ 307 │ inp.validate_inputs() │
│ 308 │ output_shape = inp.normalize_bmhk() │
│ 309 │ if op is None: │
│ ❱ 310 │ │ op = _dispatch_fw(inp) │
│ 311 │ else: │
│ 312 │ │ _ensure_op_supports_or_raise(ValueError, "memory_efficient_att │
│ 313 │
│ │
│ /opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py:98 in │
│ _dispatch_fw │
│ │
│ 95 │ if _is_triton_fwd_fastest(inp): │
│ 96 │ │ priority_list_ops.remove(triton.FwOp) │
│ 97 │ │ priority_list_ops.insert(0, triton.FwOp) │
│ ❱ 98 │ return _run_priority_list( │
│ 99 │ │ "memory_efficient_attention_forward", priority_list_ops, inp │
│ 100 │ ) │
│ 101 │
│ │
│ /opt/conda/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py:73 in │
│ _run_priority_list │
│ │
│ 70 {textwrap.indent(_format_inputs_description(inp), ' ')}""" │
│ 71 │ for op, not_supported in zip(priority_list, not_supported_reasons) │
│ 72 │ │ msg += "\n" + _format_not_supported_reasons(op, not_supported) │
│ ❱ 73 │ raise NotImplementedError(msg) │
│ 74 │
│ 75 │
│ 76 def _dispatch_fw(inp: Inputs) -> Type[AttentionFwOpBase]: │
╰──────────────────────────────────────────────────────────────────────────────╯
NotImplementedError: No operator found for `memory_efficient_attention_forward`
with inputs:
query : shape=(1, 4096, 8, 40) (torch.float16)
key : shape=(1, 4096, 8, 40) (torch.float16)
value : shape=(1, 4096, 8, 40) (torch.float16)
attn_bias : <class 'NoneType'>
p : 0.0
`cutlassF` is not supported because:
xFormers wasn't build with CUDA support
Operator wasn't built - see `python -m xformers.info` for more info
`flshattF` is not supported because:
xFormers wasn't build with CUDA support
Operator wasn't built - see `python -m xformers.info` for more info
requires a GPU with compute capability > 7.5
`tritonflashattF` is not supported because:
xFormers wasn't build with CUDA support
requires A100 GPU
`smallkF` is not supported because:
xFormers wasn't build with CUDA support
dtype=torch.float16 (supported: {torch.float32})
max(query.shape[-1] != value.shape[-1]) > 32
Operator wasn't built - see `python -m xformers.info` for more info
unsupported embed per head: 40
steps: 0%| | 0/4600 [00:01<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /opt/conda/bin/accelerate:8 in <module> │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if __name__ == '__main__': │
│ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.p │
│ y:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if __name__ == "__main__": │
│ │
│ /opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py:1104 │
│ in launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == Com │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py:567 in │
│ simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.return │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/opt/conda/bin/python3.10', 'train_network.py',
'--sample_prompts=/kaggle/working/LoRA/config/sample_prompt.txt',
'--dataset_config=/kaggle/working/LoRA/config/dataset_config.toml',
'--config_file=/kaggle/working/LoRA/config/config_file.toml']' returned non-zero
exit status 1.
Hi. I'm trying to run the Dreambooth LoRA Training on a Kaggle notebook, with GPU P100 Accelerator. I'm getting this error while executing the 5.5. Start Training code cell. I tried changing XFormers version (0.0.16, 0.0.17, 0.0.18, 0.0.19...) but anything solves the issue. Currently Torch 2 is being used. I don't know if there's a way to change it without breaking everything, so I haven't tested with torch 1.3.
This is the result of doing !python -m xformers.info:
The actual error:
Any help is very appreciated! Thank you.