bmaltais / kohya_ss

Apache License 2.0
9.65k stars 1.24k forks source link

Pagefile issues at training Step0 #235

Closed VisionaryMind closed 1 year ago

VisionaryMind commented 1 year ago

I am unable to get beyond training Step 0. It pauses for ~15 seconds then generates the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\VisMind\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\downloads\kohya_ss\train_network.py", line 1, in <module>
    from torch.cuda.amp import autocast
  File "D:\downloads\kohya_ss\venv\lib\site-packages\torch\__init__.py", line 129, in <module>
    raise err
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "D:\downloads\kohya_ss\venv\lib\site-packages\torch\lib\cufft64_10.dll" or one of its dependencies.

I've looked around for a solution, and someone suggested taking the "number of workers" down to 1. I attempted to decrease the Number of CPU Threads per Core parameter, however this does not make a difference. Has anyone run into this before?

VisionaryMind commented 1 year ago

Finally got this working by shutting down unneeded memory-resident apps. I trained 20 768x768 images on an RTX 2070 Super with 8GB VRAM. Took about 2 hours, 30 minutes with GPU maxing out to 95%. Unfortunately, the end result, used in prompt, looks absolutely nothing like the input images. I gave it 20 examples of a Caucasian male subject, and the prompts (with the Lora injection) yield mostly Asians and Latinos. Not close, by a long shot. At least with the hypernetworks, there was a vague resemblance to the subject. Lora seems to have a preference for certain types of faces.