Closed Rcorthy closed 5 months ago
... but I checked and made certain that my computer is CUDA-capable and has the latest CUDA installed. I've tried manually identifying my GPU in the setup and using a fresh install.
If you're using the latest CUDA toolkit version 12, that will not work. The components that kohya_ss installed by default are currently dependent on CUDA toolkit version 11.8.
... but I checked and made certain that my computer is CUDA-capable and has the latest CUDA installed. I've tried manually identifying my GPU in the setup and using a fresh install.
If you're using the latest CUDA toolkit version 12, that will not work. The components that kohya_ss installed by default are currently dependent on CUDA toolkit version 11.8.
I tried with 11.8 before trying the newest version and went ahead and tried it again just in case. I have what appears to be the same errors. Could it be that my system is not powerful enough anymore? I'm running on a Nvidia Quadro T1200 with 4GB of VRAM.
I tried it on an older laptop with an Nvidia GTX 1060. That one is older than a Quadro T1200 and WD14 was able to run using onnx perfectly fine. Didn't even need to install CUDA Toolkit.
Now I'm thinking that If your system was able to run it before, it should remain capable of running that. What's your nvidia driver version, is it reasonably recent too?
I tried it on an older laptop with an Nvidia GTX 1060. That one is older than a Quadro T1200 and WD14 was able to run using onnx perfectly fine. Didn't even need to install CUDA Toolkit.
Now I'm thinking that If your system was able to run it before, it should remain capable of running that. What's your nvidia driver version, is it reasonably recent too?
My driver is the most recent one for my card (version 555.85). If I remember right, updating my driver was one of the things I tried when I first troubleshot. I just double-checked my CUDA, though, and it's still in version 12.5. I'll have to try reverting to 11.8 again and see if it actually works this time.
Tried again with CUDA 11.8 actually installed this time. Seems to be the same issue still.
Does the gui startup log indicate what GPU was detected on your machine?
Mine looks like this:
INFO Kohya_ss GUI version: v24.1.4
INFO Submodule initialized and updated.
INFO nVidia toolkit detected
INFO Torch 2.1.2+cu118
INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700
INFO Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24215 Arch (8, 9) Cores 128
INFO Python version is 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
INFO Verifying modules installation status from /home/user/kohya_ss/requirements_linux.txt...
INFO Verifying modules installation status from requirements.txt...```
Does the gui startup log indicate what GPU was detected on your machine?
Mine looks like this:
INFO Kohya_ss GUI version: v24.1.4 INFO Submodule initialized and updated. INFO nVidia toolkit detected INFO Torch 2.1.2+cu118 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 INFO Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24215 Arch (8, 9) Cores 128 INFO Python version is 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] INFO Verifying modules installation status from /home/user/kohya_ss/requirements_linux.txt... INFO Verifying modules installation status from requirements.txt...```
Looks like it is: 21:05:09-724719 INFO Kohya_ss GUI version: v24.1.4 21:05:12-751116 INFO Submodule initialized and updated. 21:05:12-766119 INFO nVidia toolkit detected 21:05:18-048407 INFO Torch 2.1.2+cu118 21:05:18-129426 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700 21:05:18-133428 INFO Torch detected GPU: NVIDIA T1200 Laptop GPU VRAM 4096 Arch (7, 5) Cores 16 21:05:18-141428 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] 21:05:18-144425 INFO Verifying modules installation status from requirements_pytorch_windows.txt... 21:05:18-146425 INFO Verifying modules installation status from requirements_windows.txt... 21:05:18-149435 INFO Verifying modules installation status from requirements.txt... 21:05:35-483629 INFO headless: False 21:05:35-534630 INFO Using shell=True when running external commands...
The log seems normal.
Next idea: can you look for the folder at C:\Users\yourname\.cache\huggingface\accelerate
and see if there is a file called default_config.yaml
in there? If that yaml file is there, just delete it (or move/rename/etc).
The log seems normal.
Next idea: can you look for the folder at
C:\Users\yourname\.cache\huggingface\accelerate
and see if there is a file calleddefault_config.yaml
in there? If that yaml file is there, just delete it (or move/rename/etc).
Deleted the file and tried again without onnx. I got the exact same error message I was getting before when not using onnx.
What's the result for running it with onnx?
Good sir, you are a saint. Running with onnx and deleting the .yaml worked. I really appreciate your help.
Excellent
I recently updated to the latest version and have been unable to use the WD14 tagger ever since. I am getting different errors when I try with and without onnx. When I try with onnx, I think the error is telling me that onnx can't recognize a CUDA-capable device, but I checked and made certain that my computer is CUDA-capable and has the latest CUDA installed. I've tried manually identifying my GPU in the setup and using a fresh install. The error code for this case is the first one When I try wihtout onnx, I seem to getting something about Keras 3 only supporting certain file types. I was kinda able to decipher the other error text, but I haven't the slightest what this one's about. The error code for this case is the second one attached. It may be worth noting that on other GUIs like Automatic1111 and ComfyUI, I can get the tagging models to work just fine, but I really prefer koyha. Any guidance would be greatly appreciated.
. . . CASE 1: WITH ONNX
EP Error D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 100: no CUDA-capable device is detected ; GPU=-727787712 ; hostname=RHIT-PW01EGB5 ; file=D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_executionprovider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
when using ['CUDAExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Traceback (most recent call last): File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) RuntimeError: D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 100: no CUDA-capable device is detected ; GPU=-727787712 ; hostname=RHIT-PW01EGB5 ; file=D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_executionprovider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "C:\Users\billipem\AIUI\kohya_ss\sd-scripts\finetune\tag_images_by_wd14_tagger.py", line 514, in
main(args)
File "C:\Users\billipem\AIUI\kohya_ss\sd-scripts\finetune\tag_images_by_wd14_tagger.py", line 154, in main
ort_sess = ort.InferenceSession(
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 432, in init
raise fallback_error from e
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 427, in init
self._create_inference_session(self._fallback_providers, None)
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 100: no CUDA-capable device is detected ; GPU=-727787712 ; hostname=RHIT-PW01EGB5 ; file=D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_executionprovider.cc ; line=245 ; expr=cudaSetDevice(info.device_id);
Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\billipem\AIUI\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\billipem\AIUI\kohya_ss\venv\Scripts\python.exe', 'C:/Users/billipem/AIUI/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py', '--batch_size', '1', '--caption_extension', '.txt', '--caption_separator', ', ', '--debug', '--frequency_tags', '--general_threshold', '0.3', '--max_data_loader_n_workers', '2', '--onnx', '--remove_underscore', '--repo_id', 'SmilingWolf/wd-v1-4-convnextv2-tagger-v2', '--thresh', '0.4', 'C:/Users/billipem/AIUI/Training_Data/BSQ/New']' returned non-zero exit status 1.
23:33:22-474728 INFO ...captioning done
. . . CASE 2: WITHOUT ONNX
Traceback (most recent call last): File "C:\Users\billipem\AIUI\kohya_ss\sd-scripts\finetune\tag_images_by_wd14_tagger.py", line 514, in
main(args)
File "C:\Users\billipem\AIUI\kohya_ss\sd-scripts\finetune\tag_images_by_wd14_tagger.py", line 165, in main
model = load_model(f"{model_location}")
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\keras\src\saving\saving_api.py", line 193, in load_model
raise ValueError(
ValueError: File format not supported: filepath=wd14_tagger_model\SmilingWolf_wd-v1-4-convnextv2-tagger-v2. Keras 3 only supports V3
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "C:\Users\billipem\AIUI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\billipem\AIUI\kohya_ss\venv\Scripts\python.exe', 'C:/Users/billipem/AIUI/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py', '--batch_size', '1', '--caption_extension', '.txt', '--caption_separator', ', ', '--debug', '--frequency_tags', '--general_threshold', '0.3', '--max_data_loader_n_workers', '2', '--remove_underscore', '--repo_id', 'SmilingWolf/wd-v1-4-convnextv2-tagger-v2', '--thresh', '0.4', 'C:/Users/billipem/AIUI/Training_Data/BSQ/New']' returned non-zero exit status 1.
23:44:54-058312 INFO ...captioning done
.keras
files and legacy H5 format files (.h5
extension). Note that the legacy SavedModel format is not supported byload_model()
in Keras 3. In order to reload a TensorFlow SavedModel as an inference-only layer in Keras 3, usekeras.layers.TFSMLayer(wd14_tagger_model\SmilingWolf_wd-v1-4-convnextv2-tagger-v2, call_endpoint='serving_default')
(note that yourcall_endpoint
might have a different name). Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\billipem\AIUI\kohya_ss\venv\Scripts\accelerate.EXE__main__.py", line 7, in