NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.3k stars 1.11k forks source link

Cannot train on a custom dataset #163

Open roxy0230 opened 2 years ago

roxy0230 commented 2 years ago

I tried to test a training just to verify if I can make this architecture work. I got everything done until the train.py part (even the dataset is generated as in the documentation and it's pretty small because I just wanted to see if it works). But it gets me OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\with_some_dll or one of its dependencies error every time. I tried to increase the paging or reduce the workers or other stuff I saw on forms but it did not work. If I let the training with 1 GPU it, of course, runs out of space. I also tried to reinstall torch because I thought it might have something with the DLLs but it still did not work. Can someone could help me?

To Reproduce Steps to reproduce the behavior:

  1. I runed the command python train.py --outdir=training-runs --data=art-processed_256x256.zip --cfg=stylegan3-r --gpus=8 --batch=32 --gamma=0.5
  2. See error Output directory: training-runs\00015-stylegan3-r-art-processed_256x256-gpus8-batch32-gamma0.5 Number of GPUs: 8 Batch size: 32 images Training duration: 25000 kimg Dataset path: art-processed_256x256.zip Dataset size: 250 images Dataset resolution: 256 Dataset labels: False Dataset x-flips: False

Creating output directory... Launching processes... Loading training set... Traceback (most recent call last): File "", line 1, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code Traceback (most recent call last): File "", line 1, in Traceback (most recent call last): File "", line 1, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "", line 1, in exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main Traceback (most recent call last): exitcode = _main(fd, parent_sentinel) File "", line 1, in Traceback (most recent call last): File "", line 1, in exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in import torch File "", line 1, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch__init__.py", line 126, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main Traceback (most recent call last): exitcode = _main(fd, parent_sentinel) prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path Traceback (most recent call last): exitcode = _main(fd, parent_sentinel) Traceback (most recent call last): raise err _fixup_main_from_path(data['init_main_from_path']) File "", line 1, in exitcode = _main(fd, parent_sentinel) Traceback (most recent call last): Traceback (most recent call last): File "", line 1, in Traceback (most recent call last): Traceback (most recent call last): Error processing line 1 of E:\Anaconda\envs\stylegan3-test2\lib\site-packages\matplotlib-3.4.2-py3.9-nspkg.pth: Traceback (most recent call last): File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare Fatal Python error: Traceback (most recent call last): main_content = runpy.run_path(main_path, File "", line 1, in Fatal Python error: File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "", line 1, in OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cudnn_adv_train64_8.dll" or one of its dependencies. File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main File "", line 1, in File "", line 1, in File "", line 1, in Fatal Python error: Fatal Python error: File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main File "", line 1, in

File "", line 1, in prepare(preparation_data) Error processing line 1 of E:\Anaconda\envs\stylegan3-test2\lib\site-packages\matplotlib-3.4.2-py3.9-nspkg.pth: _fixup_main_from_path(data['init_main_from_path']) init_import_site File "", line 1, in File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main init_sys_streams prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main main_content = runpy.run_path(main_path, prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main init_import_siteinit_import_site exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing__init__.py", line 16, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main

File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path Fatal Python error: : File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main return _run_module_code(code, init_globals, run_name, : exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main exitcode = _main(fd, parent_sentinel) exitcode = _main(fd, parent_sentinel) exitcode = _main(fd, parent_sentinel) : : File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main from . import context _fixup_main_from_path(data['init_main_from_path']) exitcode = _main(fd, parent_sentinel) init_import_site main_content = runpy.run_path(main_path, Failed to import the site module exitcode = _main(fd, parent_sentinel) Traceback (most recent call last): File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code can't initialize sys standard streams File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main _fixup_main_from_path(data['init_main_from_path']) return _run_module_code(code, init_globals, run_name, _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main Failed to import the site moduleFailed to import the site module prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\context.py", line 6, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main : File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path

File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 169, in addpackage _run_code(code, mod_globals, init_globals,

prepare(preparation_data)

File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare prepare(preparation_data) prepare(preparation_data)

File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare prepare(preparation_data) from . import reduction main_content = runpy.run_path(main_path, Failed to import the site module prepare(preparation_data) Python runtime state: return _run_module_code(code, init_globals, run_name, exec(line) prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code Python runtime state: File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare _run_code(code, mod_globals, init_globals, main_content = runpy.run_path(main_path, _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare Python runtime state: Python runtime state: File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\reduction.py", line 16, in File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path

File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare initialized File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code File "", line 1, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare core initialized exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path _fixup_main_from_path(data['init_main_from_path']) _fixup_main_from_path(data['init_main_from_path']) initializedinitialized _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path import socket Python runtime state: return _run_module_code(code, init_globals, run_name,

_fixup_main_from_path(data['init_main_from_path'])

Traceback (most recent call last): _run_code(code, mod_globals, init_globals,

_fixup_main_from_path(data['init_main_from_path'])

File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path

File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path File "E:\Anaconda\envs\stylegan3-test2\lib\socket.py", line 51, in main_content = runpy.run_path(main_path, initialized File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code File "", line 1007, in _find_and_load File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 73, in File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code MemoryError File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path import torch File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code main_content = runpy.run_path(main_path, File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path main_content = runpy.run_path(main_path, Traceback (most recent call last): Traceback (most recent call last): main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path

File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
_run_code(code, mod_globals, init_globals,
main_content = runpy.run_path(main_path,
import os

exec(code, run_globals)

File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch__init__.py", line 126, in main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path _run_code(code, mod_globals, init_globals, import torch File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path _run_code(code, mod_globals, init_globals, import _socket File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 73, in File "", line 1007, in _find_and_load File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, Traceback (most recent call last): File "", line 680, in _load_unlocked File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path File "", line 1007, in _find_and_load

File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in raise err File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch__init.py", line 126, in return _run_module_code(code, init_globals, run_name, return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code ImportError: DLL load failed while importing _socket: The paging file is too small for this operation to complete. File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code return _run_module_code(code, init_globals, run_name, File "", line 986, in _find_and_load_unlocked import os File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 169, in addpackage File "", line 846, in exec_module exec(code, run_globals) return _run_module_code(code, init_globals, run_name, File "", line 986, in _find_and_load_unlocked Current thread 0xOSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. import torch File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code return _run_module_code(code, init_globals, run_name, exec(code, run_globals) raise err File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "", line 680, in _load_unlocked File "E:\Anaconda\envs\stylegan3-test2\lib\os.py", line 29, in _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code exec(line) File "", line 941, in get_code File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code File "", line 680, in _load_unlocked 00002638 File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\init.py", line 126, in _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. _run_code(code, mod_globals, init_globals, File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code _run_code(code, mod_globals, init_globals, File "", line 846, in exec_module from _collections_abc import _check_methods File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code _run_code(code, mod_globals, init_globals, File "", line 1, in File "", line 1039, in get_data import torch _run_code(code, mod_globals, init_globals, (most recent call first): File "", line 846, in exec_module raise err File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code _run_code(code, mod_globals, init_globals, import torch File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code import torch File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code File "", line 941, in get_code File "", line 1007, in _find_and_load exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code File "", line 1007, in _find_and_load MemoryError File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\init__.py", line 126, in File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code

OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. File "", line 941, in get_code File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\__init__.py", line 126, in exec(code, run_globals) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\__init__.py", line 126, in exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in exec(code, run_globals) File "", line 1040, in get_data File "", line 986, in _find_and_load_unlocked File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 13, in exec(code, run_globals) Remainder of file ignored File "", line 986, in _find_and_load_unlocked raise err exec(code, run_globals) File "", line 1040, in get_data exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 13, in raise err File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 13, in raise err File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 13, in import torch MemoryError File "", line 680, in _load_unlocked File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 13, in import click File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\__init__.py", line 7, in File "", line 680, in _load_unlocked OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 13, in File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in MemoryError import torch import click OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. import click OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. import click File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\__init__.py", line 126, in File "", line 846, in exec_module import click from .core import Argument as Argument File "", line 846, in exec_module import click import torch File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\__init__.py", line 126, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\__init__.py", line 7, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\__init__.py", line 7, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\__init__.py", line 7, in raise err File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\__init__.py", line 7, in File "", line 941, in get_code File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 16, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\__init__.py", line 7, in File "", line 941, in get_code File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\__init__.py", line 21, in raise err from .core import Argument as Argument from .core import Argument as Argument OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. from .core import Argument as Argument from .core import Argument as Argument File "", line 1040, in get_data from . import types from .core import Argument as Argument File "", line 1039, in get_data from ._utils import _import_dotted_name, classproperty OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cublas64_11.dll" or one of its dependencies. File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 16, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 16, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 16, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 16, in File "", line 1007, in _find_and_load MemoryError File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 16, in MemoryError File "", line 1007, in _find_and_load from . import types from . import types from . import types from . import types File "", line 986, in _find_and_load_unlocked from . import types During handling of the above exception, another exception occurred: File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\types.py", line 8, in File "", line 986, in _find_and_load_unlocked File "", line 1007, in _find_and_load File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\types.py", line 8, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\types.py", line 8, in File "", line 680, in _load_unlocked File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\types.py", line 8, in Traceback (most recent call last): from ._compat import _get_argv_encoding File "", line 680, in _load_unlocked File "", line 986, in _find_and_load_unlocked from ._compat import _get_argv_encoding from ._compat import _get_argv_encoding File "", line 846, in exec_module File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 589, in from ._compat import _get_argv_encoding File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_compat.py", line 518, in File "", line 846, in exec_module File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_compat.py", line 518, in File "", line 680, in _load_unlocked File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_compat.py", line 518, in File "", line 941, in get_code File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_compat.py", line 518, in main() File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 576, in main from ._winconsole import _get_windows_console_stream File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_winconsole.py", line 45, in ("CommandLineToArgvW", windll.shell32) File "E:\Anaconda\envs\stylegan3-test2\lib\ctypes\__init__.py", line 452, in __getattr__ dll = self._dlltype(name) File "E:\Anaconda\envs\stylegan3-test2\lib\ctypes\__init__.py", line 382, in __init__ self._handle = _dlopen(self._name, mode) OSError: [WinError 1455] The paging file is too small for this operation to complete File "", line 846, in exec_module from ._winconsole import _get_windows_console_stream File "", line 1040, in get_data from ._winconsole import _get_windows_console_stream from ._winconsole import _get_windows_console_stream File "", line 941, in get_code known_paths = addsitepackages(known_paths) File "", line 941, in get_code File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_winconsole.py", line 13, in MemoryError File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_winconsole.py", line 13, in File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\_winconsole.py", line 13, in File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 359, in addsitepackages File "", line 1039, in get_data File "", line 1040, in get_data from ctypes import byref from ctypes import byref from ctypes import byref addsitedir(sitedir, known_paths) MemoryError MemoryError File "E:\Anaconda\envs\stylegan3-test2\lib\ctypes\__init__.py", line 8, in File "E:\Anaconda\envs\stylegan3-test2\lib\ctypes\__init__.py", line 8, in File "E:\Anaconda\envs\stylegan3-test2\lib\ctypes\__init__.py", line 8, in File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 208, in addsitedir from _ctypes import Union, Structure, Array from _ctypes import Union, Structure, Array from _ctypes import Union, Structure, Array addpackage(sitedir, name, known_paths) ImportError: DLL load failed while importing _ctypes: The paging file is too small for this operation to complete. ImportError: DLL load failed while importing _ctypes: The paging file is too small for this operation to complete. ImportError: DLL load failed while importing _ctypes: The paging file is too small for this operation to complete. File "E:\Anaconda\envs\stylegan3-test2\lib\site.py", line 179, in addpackage import traceback File "E:\Anaconda\envs\stylegan3-test2\lib\traceback.py", line 3, in import collections File "", line 1007, in _find_and_load File "", line 986, in _find_and_load_unlocked File "", line 680, in _load_unlocked File "", line 846, in exec_module File "", line 941, in get_code File "", line 1040, in get_data MemoryError Traceback (most recent call last): File "", line 1, in File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3-test2\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3-test2\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in import torch File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\__init__.py", line 126, in raise err OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\lib\cudnn_cnn_infer64_8.dll" or one of its dependencies. Traceback (most recent call last): File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 286, in main() # pylint: disable=no-value-for-parameter File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 1053, in main rv = self.invoke(ctx) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\click\core.py", line 754, in invoke return __callback(*args, **kwargs) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 281, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 98, in launch_training torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes while not context.join(): File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 4 terminated with the following error: Traceback (most recent call last): File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap fn(i, *args) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 47, in subprocess_fn training_loop.training_loop(rank=rank, **c) File "E:\Proiecte_Dizertatie\stylegan3\training\training_loop.py", line 152, in training_loop G = dnnlib.util.construct_class_by_name(**G_kwargs, **common_kwargs).train().requires_grad_(False).to(device) # subclass of torch.nn.Module File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\nn\modules\module.py", line 907, in to return self._apply(convert) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply module._apply(fn) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply module._apply(fn) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\nn\modules\module.py", line 578, in _apply module._apply(fn) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\nn\modules\module.py", line 601, in _apply param_applied = fn(param) File "E:\Anaconda\envs\stylegan3-test2\lib\site-packages\torch\nn\modules\module.py", line 905, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Please copy&paste text instead of screenshots for better searchability. **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: Windows 11 - PyTorch version 1.11.0 - CUDA toolkit version CUDA 11.3 - NVIDIA driver version - GPU RTX 3060 - Docker: No
PDillis commented 2 years ago

Do you have one GPU or 8 GPUs? Your training command is --gpus=8, but the error you are getting (RuntimeError: CUDA error: invalid device ordinal) points to the fact that you don't have the some of these GPUs locally (see e.g. this). If you do have 8 GPUs, try running in a command line:

import torch
print(f'No. available GPUs: {torch.cuda.device_count()}')

and verify you get 8, just in case there are some GPUs not being detected. If you are running out of space, you can lower the batch size per GPU, i.e., --batch-gpu=8 or as needed, as the setting of --batch will do batch accumulation and won't affect how much memory you use.

Lastly, --cfg=stylegan3-r is the heaviest model, do you really need it? Looking at your GPU (3060), I'd recommend going with stylegan3-t or stylegan2, depending on whether or not you are interested in the former's translation equivariance or not. stylegan3-r takes a long time to train, even with the newest and highest capacity cards.

roxy0230 commented 2 years ago

I run it on one GPU before and it indeed said that it doesn't have enough memory and I tried it again with a smaller batch-gpu as you said. It started the training and got the first fake image and the init fake image but the execution stopped. It looks like this:

Setting up augmentation... Distributing across 1 GPUs... Setting up training phases... Exporting sample images... Initializing logs... Skipping tfevents export: No module named 'tensorboard' Training for 25000 kimg...

tick 0 kimg 0.0 time 2m 32s sec/tick 24.2 sec/kimg 756.85 maintenance 128.2 cpumem 4.77 gpumem 4.79 reserved 5.17 augment 0.000 Evaluating metrics... Traceback (most recent call last): File "", line 1, in File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in import torch File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch__init.py", line 123, in raise err OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies. Traceback (most recent call last): File "", line 1, in File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in import torch File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch__init__.py", line 123, in raise err OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies. Traceback (most recent call last): File "", line 1, in File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "E:\Anaconda\envs\stylegan3\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "E:\Anaconda\envs\stylegan3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\Proiecte_Dizertatie\stylegan3\train.py", line 17, in import torch File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\init__.py", line 123, in raise err OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies. Traceback (most recent call last): File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\utils\data\dataloader.py", line 986, in _try_get_data data = self._data_queue.get(timeout=timeout) File "E:\Anaconda\envs\stylegan3\lib\queue.py", line 178, in get raise Empty _queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "train.py", line 286, in main() # pylint: disable=no-value-for-parameter File "E:\Anaconda\envs\stylegan3\lib\site-packages\click\core.py", line 1128, in call return self.main(args, kwargs) File "E:\Anaconda\envs\stylegan3\lib\site-packages\click\core.py", line 1053, in main rv = self.invoke(ctx) File "E:\Anaconda\envs\stylegan3\lib\site-packages\click\core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "E:\Anaconda\envs\stylegan3\lib\site-packages\click\core.py", line 754, in invoke return __callback(args, kwargs) File "train.py", line 281, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "train.py", line 96, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "train.py", line 47, in subprocess_fn training_loop.training_loop(rank=rank, c) File "E:\Proiecte_Dizertatie\stylegan3\training\training_loop.py", line 380, in training_loop result_dict = metric_main.calc_metric(metric=metric, G=snapshot_data['G_ema'], File "E:\Proiecte_Dizertatie\stylegan3\metrics\metric_main.py", line 48, in calc_metric results = _metric_dictmetric File "E:\Proiecte_Dizertatie\stylegan3\metrics\metric_main.py", line 88, in fid50k_full fid = frechet_inception_distance.compute_fid(opts, max_real=None, num_gen=50000) File "E:\Proiecte_Dizertatie\stylegan3\metrics\frechet_inception_distance.py", line 25, in compute_fid mu_real, sigma_real = metric_utils.compute_feature_stats_for_dataset( File "E:\Proiecte_Dizertatie\stylegan3\metrics\metric_utils.py", line 231, in compute_feature_stats_for_dataset for images, _labels in torch.utils.data.DataLoader(dataset=dataset, sampler=item_subset, batch_size=batch_size, **data_loader_kwargs): File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\utils\data\dataloader.py", line 517, in next data = self._next_data() File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\utils\data\dataloader.py", line 1182, in _next_data idx, data = self._get_data() File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\utils\data\dataloader.py", line 1138, in _get_data success, data = self._try_get_data() File "E:\Anaconda\envs\stylegan3\lib\site-packages\torch\utils\data\dataloader.py", line 999, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 13436, 17768, 3676) exited unexpectedly

PDillis commented 2 years ago

Do you need to run the metrics? Unless it's for an academic setting (or you use it to evaluate when to stop; I use the fakes outputs themselves to gauge when to stop), I've permanently set --metrics=none. However, the OSError you are getting is perhaps the largest indicator that there might be something else. Try if not calculating any metric helps you start the training at least though!

roxy0230 commented 2 years ago

It worked after i took out the metrics. But indeed I need this for academic purposes. Is there a way to make that work?

AmitMY commented 1 year ago

I also need the metrics and encounter the same issue.

Solution: I ran this via docker, and now added --ipc=host which solves this

jasuriy commented 2 months ago

@roxy0230 hi did you fix the issue? were you able to train the model with your own dataset?