bmaltais / kohya_ss

Apache License 2.0
9.41k stars 1.22k forks source link

GPU utilization is too low when IPEX is used #2487

Open DDXDB opened 4 months ago

DDXDB commented 4 months ago

I followed the steps below to install iPEX-based kohya_ss

git clone https://github.com/bmaltais/kohya_ss.git

cd kohya_ss

.\setup.bat

 kohya_ss setup menu:
1

7

.\venv\Scripts\activate.bat

python -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y

pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.1.10%2Bxpu/intel_extension_for_pytorch-2.1.10+xpu-cp310-cp310-win_amd64.whl https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.1.10%2Bxpu/torch-2.1.0a0+cxx11.abi-cp310-cp310-win_amd64.whl https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.1.10%2Bxpu/torchaudio-2.1.0a0+cxx11.abi-cp310-cp310-win_amd64.whl https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.1.10%2Bxpu/torchvision-0.16.0a0+cxx11.abi-cp310-cp310-win_amd64.whl

.\setup.bat

5

*This machine
*No distributed training
no
yes
no
no
all
bf16

The following computer configurations have been tested:

PC1
OS: Windows11 23H2 22631.3527
CPU: R5 5600X
GPU: ARC A770&A750(A750 has been disabled)
GPU Drive:gfx_win_101.5448.exe
PC2
OS: Windows11 23H2 22631.3527
CPU: I3 12100f
GPU: ARC A380
GPU Drive:gfx_win_101.5445.exe
This is a brand new computer with only the standard windows11 and other necessary environments installed

It appears to run without problems or errors. but Regardless of the computer, GPU usage has been fluctuating between 5-30%, can not be maintained at 100%. The A770 is only about 7s/it.I was able to achieve 100% utilization in March using the same configuration and very fast I use the following Settings for training: LoRA_BasicSettings_ipex (1).json

DDXDB commented 4 months ago

It has been confirmed that the problem is caused by multiple buckets with different resolutions. I adjusted the picture to 512512 and 5121024, and the current utilization rate is 80%+

bmaltais commented 4 months ago

Please raise this with kohya directly on his sd-scripts repo. Perhaps he can do something about it.

Disty0 commented 4 months ago

It will do JIT compile for each resolution which takes time. It should run at 80%-100% usage after JIT compile for each resolution in the dataset is complete.