我 4090 无论分辨率还有batchsize设置多小都爆显存。

18716536833mm commented 11 months ago

Loading settings from F:\lora-scripts-v1.6.2\config\autosave\20231011-160109.toml... F:\lora-scripts-v1.6.2\config\autosave\20231011-160109 clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません prepare tokenizers update token length: 255 Using DreamBooth method. prepare images. found directory F:\AI\xunlian\10_Paohce2 contains 32 image files 320 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (1344, 896) enable_bucket: False

[Subset 0 of Dataset 0] image_dir: "F:\AI\xunlian\10_Paohce2" image_count: 32 num_repeats: 10 shuffle_caption: True keep_tokens: 0 caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Paohce2 caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 8000.10it/s] prepare dataset prepare accelerator loading model for process 0/1 load StableDiffusion checkpoint: F:/sd-webui-aki-v4.4/models/Stable-diffusion/sd_xl_base_1.0.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: load VAE: F:\lora-scripts-v1.6.2\sd-models\sdxl_vae.safetensors additional VAE loaded Disable Diffusers' xformers Enable xformers for U-Net [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1454.41it/s] caching latents... 0it [00:00, ?it/s] number of models: 1 number of trainable parameters: 2567463684 prepare optimizer, data loader etc. bin F:\lora-scripts-v1.6.2\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll use 8-bit Lion optimizer | {} override steps. steps for 20 epochs is / 指定エポックまでのステップ数: 6400 running training / 学習開始 num examples / サンプル数: 320 num batches per epoch / 1epochのバッチ数: 320 num epochs / epoch数: 20 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 6400 steps: 0%| | 0/6400 [00:00<?, ?it/s] epoch 1/20 Traceback (most recent call last): File "F:\lora-scripts-v1.6.2\sd-scripts\sdxl_train.py", line 753, in train(args) File "F:\lora-scripts-v1.6.2\sd-scripts\sdxl_train.py", line 547, in train noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\utils\operations.py", line 636, in forward return model_forward(*args, *kwargs) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\utils\operations.py", line 624, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 1088, in forward h = call_module(module, h, emb, context) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 1073, in call_module x = layer(x, context) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 728, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 651, in forward output = self.forward_body(hidden_states, context, timestep) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 633, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states = module(hidden_states) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "F:\lora-scripts-v1.6.2\sd-scripts\library\sdxl_original_unet.py", line 555, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 23.99 GiB total capacity; 18.38 GiB already allocated; 2.11 GiB free; 19.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0%| | 0/6400 [00:24<?, ?it/s] Traceback (most recent call last): File "F:\lora-scripts-v1.6.2\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "F:\lora-scripts-v1.6.2\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:

也试过512的训练集图，都失败了，更换优化器等等都没用，也更新了驱动，都报错，不知道啥情况，求秋叶大佬指点，真的迷茫了

18716536833mm commented 11 months ago

prepare dataset prepare accelerator loading model for process 0/1 load StableDiffusion checkpoint: F:/sd-webui-aki-v4.4/models/Stable-diffusion/sd_xl_base_1.0.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: load VAE: F:\lora-scripts-v1.6.2\sd-models\sdxl_vae.safetensors additional VAE loaded Disable Diffusers' xformers Enable xformers for U-Net [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 1333.32it/s] caching latents... 0it [00:00, ?it/s] number of models: 1 number of trainable parameters: 2567463684 prepare optimizer, data loader etc. bin F:\lora-scripts-v1.6.2\python\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll use 8-bit AdamW optimizer | {} override steps. steps for 20 epochs is / 指定エポックまでのステップ数: 6400 running training / 学習開始 num examples / サンプル数: 320 num batches per epoch / 1epochのバッチ数: 320 num epochs / epoch数: 20 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 6400 steps: 0%| | 0/6400 [00:00<?, ?it/s] epoch 1/20 Traceback (most recent call last): File "F:\lora-scripts-v1.6.2\sd-scripts\sdxl_train.py", line 753, in train(args) File "F:\lora-scripts-v1.6.2\sd-scripts\sdxl_train.py", line 567, in train accelerator.backward(loss) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\accelerator.py", line 1983, in backward self.scaler.scale(loss).backward(*kwargs) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch_tensor.py", line 487, in backward torch.autograd.backward( File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\autograd__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\autograd\function.py", line 274, in apply return user_fn(self, args) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\utils\checkpoint.py", line 157, in backward torch.autograd.backward(outputs_with_grad, args_with_grad) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\torch\autograd__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 23.99 GiB total capacity; 17.23 GiB already allocated; 3.08 GiB free; 18.07 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF steps: 0%| | 0/6400 [00:24<?, ?it/s] Traceback (most recent call last): File "F:\lora-scripts-v1.6.2\python\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "F:\lora-scripts-v1.6.2\python\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 996, in main() File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 992, in main launch_command(args) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "F:\lora-scripts-v1.6.2\python\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\lora-scripts-v1.6.2\python\python.exe', './sd-scripts/sdxl_train.py', '--config_file', 'F:\lora-scripts-v1.6.2\config\autosave\20231011-164646.toml']' returned non-zero exit status 1. 16:47:22-766548 ERROR Training failed / 训练失败

开了量化梯度检查也爆了，上面是最后自暴自弃了hhh

18716536833mm commented 11 months ago

设置开开关关，还有优化器也都试过，怎么都不能正常训练

18716536833mm commented 11 months ago

_EFS38MG7894TC@@_H51X0Y `2JHWZ}BR9))7R8DL@1R8ZA $3_23~R3M6P4I{Z5M )IZC63$ 3FOD ~5S_3_K`_GFB}0DHPO $Z8P ~~N`G4WFQ_BJAR}XE{A$ model_train_type = "sdxl-finetune" pretrained_model_name_or_path = "F:/sd-webui-aki-v4.4/models/Stable-diffusion/sd_xl_base_1.0.safetensors" vae = "F:\lora-scripts-v1.6.2\sd-models\sdxl_vae.safetensors" v2 = false train_data_dir = "F:/AI/xunlian/" resolution = "1344,896" enable_bucket = false min_bucket_reso = 64 max_bucket_reso = 1_344 bucket_reso_steps = 32 output_name = "paochexl" output_dir = "F:/AI/MOXING" save_model_as = "safetensors" save_precision = "fp16" save_every_n_epochs = 5 max_train_epochs = 20 train_batch_size = 1 gradient_checkpointing = true learning_rate = 0.00001 lr_scheduler = "cosine_with_restarts" lr_warmup_steps = 20 lr_scheduler_num_cycles = 1 optimizer_type = "AdamW8bit" log_with = "tensorboard" logging_dir = "F:\AI\rizhi" caption_extension = ".txt" shuffle_caption = true weighted_captions = false keep_tokens = 0 max_token_length = 255 seed = 1_337 clip_skip = 2 no_token_padding = false mixed_precision = "fp16" full_fp16 = false xformers = true lowram = false cache_latents = true cache_latents_to_disk = true persistent_data_loader_workers = false multi_gpu = false sample_sampler = "ddim" sample_prompts = "1high-tech motorcyclepao_che,best quality,(Professionally color graded),trending on artstation,(scifi),Ultra detailed,rich colors,8K,HDR,Octane Render,Redshift,Unreal Engine 5,atmosphere,amazing depth,rich colors, watermark, username, blurry, --w 768 --h 512 --l 7 --s 24 --d 1337" unet_lr = 0 text_encoder_lr = 0

RickyWang111 commented 11 months ago

大概是开启了xformers的问题，这里安装的问题，我也是同样的错误，正在查找

RickyWang111 commented 11 months ago

兄弟我看了下安装代码，那里改一下重新安装venv环境就行了，因为原安装是用的python3.10的库，但我用的3.11，install-cn.ps1 这部分改成这样就可以用xformers了： $install_torch = Read-Host "是否需要安装 Torch+xformers? [y/n] (默认为 y)" if ($install_torch -eq "y" -or $install_torch -eq "Y" -or $install_torch -eq ""){ pip install torch -f https://mirror.sjtu.edu.cn/pytorch-wheels/torch_stable.html -i https://mirror.baidu.com/pypi/simple Check "torch 安装失败，请删除 venv 文件夹后重新运行。" pip install -U -I --no-deps xformers Check "xformers 安装失败。" }

Akegarasu / lora-scripts

我 4090 无论分辨率还有batchsize设置多小都爆显存。 #261