INFO Building CLIP flux_utils.py:74
INFO Loading state dict from /sd_model/clip/sd3/clip_l.safetensors flux_utils.py:167
2024-09-25 19:53:06 INFO Loaded CLIP: flux_utils.py:170
INFO Loading state dict from /sd_model/clip/sd3/t5xxl_fp8_e4m3fn.safetensors flux_utils.py:215
2024-09-25 19:53:09 INFO Loaded T5xxl: flux_utils.py:218
INFO Loaded fp8 T5XXL model flux_train_network.py:101
INFO Building AutoEncoder flux_utils.py:62
INFO Loading state dict from /sd_model/vae/ae.safetensors flux_utils.py:66
2024-09-25 19:53:10 INFO Loaded AE: flux_utils.py:69
import network module: networks.lora_flux
2024-09-25 19:53:11 INFO [Dataset 0] train_util.py:2329
INFO caching latents with caching strategy. train_util.py:989
INFO checking cache validity... train_util.py:999
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 396/396 [00:03<00:00, 124.20it/s]
2024-09-25 19:53:14 INFO no latents to cache train_util.py:1039
2024-09-25 19:53:15 INFO move vae and unet to cpu to save memory flux_train_network.py:208
Traceback (most recent call last):
File "/app/lora-scripts/./scripts/dev/flux_train_network.py", line 519, in
trainer.train(args)
File "/app/lora-scripts/scripts/dev/train_network.py", line 402, in train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File "/app/lora-scripts/./scripts/dev/flux_train_network.py", line 212, in cache_text_encoder_outputs_if_needed
unet.to("cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 805, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1167, in convert
raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
19:53:16-580989 ERROR Training failed / 训练失败
2024-09-25 19:53:06 INFO Loaded CLIP: flux_utils.py:170
INFO Loading state dict from /sd_model/clip/sd3/t5xxl_fp8_e4m3fn.safetensors flux_utils.py:215
2024-09-25 19:53:09 INFO Loaded T5xxl: flux_utils.py:218
INFO Loaded fp8 T5XXL model flux_train_network.py:101
INFO Building AutoEncoder flux_utils.py:62
INFO Loading state dict from /sd_model/vae/ae.safetensors flux_utils.py:66
2024-09-25 19:53:10 INFO Loaded AE: flux_utils.py:69
import network module: networks.lora_flux
2024-09-25 19:53:11 INFO [Dataset 0] train_util.py:2329
INFO caching latents with caching strategy. train_util.py:989
INFO checking cache validity... train_util.py:999
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 396/396 [00:03<00:00, 124.20it/s]
2024-09-25 19:53:14 INFO no latents to cache train_util.py:1039
2024-09-25 19:53:15 INFO move vae and unet to cpu to save memory flux_train_network.py:208
Traceback (most recent call last):
File "/app/lora-scripts/./scripts/dev/flux_train_network.py", line 519, in
trainer.train(args)
File "/app/lora-scripts/scripts/dev/train_network.py", line 402, in train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File "/app/lora-scripts/./scripts/dev/flux_train_network.py", line 212, in cache_text_encoder_outputs_if_needed
unet.to("cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 805, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1167, in convert
raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
19:53:16-580989 ERROR Training failed / 训练失败
使用的docker nvcr.io/nvidia/pytorch:24.07-py3 做基础镜像,按照安装脚本去install 无法启动训练