bmaltais / kohya_ss

Apache License 2.0
9.42k stars 1.22k forks source link

Cant train SDXL #2388

Open SteVoit opened 5 months ago

SteVoit commented 5 months ago

Hi There im Trying to train an sdxl model on 2xRTX4090 under windows no matter what i do i allways get stuck with the error

RuntimeError: Trying to create tensor with negative dimension -1727503612: [-1727503612]

i have tried without multi gpu i tried most recent kohya_ss as well as kohys 24.03

both version stop on the same error

anyone can help with this?

here is my training command:

accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=1024 --pretrained_model_name_or_path="C:/Users/User/Downloads/dreamshaperXL_v21TurboDPMSDE.safetensors" --train_data_dir="C:/AI/cats/img" --resolution="1024,1024" --output_dir="C:/AI/cats/model" --logging_dir="C:/AI/cats/log" --save_model_as=safetensors --output_name="lastxl" --lr_scheduler_num_cycles="75" --max_data_loader_n_workers="0" --learning_rate_te1="0.0" --learning_rate_te2="0.0" --learning_rate="3e-07" --lr_scheduler="cosine" --lr_warmup_steps="7838" --train_batch_size="1" --max_train_steps="78375" --save_every_n_epochs="10" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".caption" --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --shuffle_caption --xformers --noise_offset=0.0

Traceback (most recent call last): File "C:\AI\kohya_223_bnbupdate\sdxl_train.py", line 775, in train(args) File "C:\AI\kohya_223_bnbupdate\sdxl_train.py", line 399, in train unet = accelerator.prepare(unet) File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1284, in prepare result = tuple( File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1285, in self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1090, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1429, in prepare_model model = torch.nn.parallel.DistributedDataParallel( File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 688, in init self._ddp_init_helper( File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 825, in _ddp_init_helper self.reducer = dist.Reducer( RuntimeError: Trying to create tensor with negative dimension -1727503612: [-1727503612] Traceback (most recent call last): File "C:\AI\kohya_223_bnbupdate\sdxl_train.py", line 775, in train(args) File "C:\AI\kohya_223_bnbupdate\sdxl_train.py", line 399, in train unet = accelerator.prepare(unet) File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1284, in prepare result = tuple( File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1285, in self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1090, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\accelerate\accelerator.py", line 1429, in prepare_model model = torch.nn.parallel.DistributedDataParallel( File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 688, in init self._ddp_init_helper( File "C:\AI\kohya_223_bnbupdate\venv\lib\site-packages\torch\nn\parallel\distributed.py", line 825, in _ddp_init_helper self.reducer = dist.Reducer( RuntimeError: Trying to create tensor with negative dimension -1727503612: [-1727503612]

bmaltais commented 5 months ago

Wish I could help but I do not have a multi-gpu setup... but I think the current sd-scripots does not support multi-gpu under windows and will only run under linux...