Redrrx commented 2 years ago

i'm having quite the issue. Training tool crashes

To Reproduce Steps to reproduce the behavior:

This is what i run python train.py --outdir=training-runs --cfg stylegan3-t --data=datasets/ready --gpus=1 --batch=10 --gamma=8.0 --mirror=1
See error `{ "G_kwargs": { "class_name": "training.networks_stylegan3.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 2 }, "channel_base": 32768, "channel_max": 512, "magnitude_ema_beta": 0.9996534864594093 }, "D_kwargs": { "class_name": "training.networks_stylegan2.Discriminator", "block_kwargs": { "freeze_layers": 0 }, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 4 }, "channel_base": 32768, "channel_max": 512 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "betas": [ 0, 0.99 ], "eps": 1e-08, "lr": 0.0025 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "betas": [ 0, 0.99 ], "eps": 1e-08, "lr": 0.002 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 8.0 }, "data_loader_kwargs": { "pin_memory": true, "prefetch_factor": 2, "num_workers": 3 }, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "datasets/ready", "use_labels": false, "max_size": 3, "xflip": true, "resolution": 256, "random_seed": 0 }, "num_gpus": 1, "batch_size": 10, "batch_gpu": 10, "metrics": [ "fid50k_full" ], "total_kimg": 25000, "kimg_per_tick": 4, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "random_seed": 0, "ema_kimg": 3.125, "augment_kwargs": { "class_name": "training.augment.AugmentPipe", "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 }, "ada_target": 0.6, "run_dir": "training-runs\00004-stylegan3-t-ready-gpus1-batch10-gamma8" }

Output directory: training-runs\00004-stylegan3-t-ready-gpus1-batch10-gamma8 Number of GPUs: 1 Batch size: 10 images Training duration: 25000 kimg Dataset path: datasets/ready Dataset size: 3 images Dataset resolution: 256 Dataset labels: False Dataset x-flips: True

Creating output directory... Launching processes... Loading training set...

Num images: 6 Image shape: [3, 256, 256] Label shape: [0]

Constructing networks... Setting up PyTorch plugin "bias_act_plugin"... Done. Setting up PyTorch plugin "filtered_lrelu_plugin"... Done.

Generator Parameters Buffers Output shape Datatype

mapping.fc0 262656 - [10, 512] float32 mapping.fc1 262656 - [10, 512] float32 mapping - 512 [10, 16, 512] float32 synthesis.input.affine 2052 - [10, 4] float32 synthesis.input 262144 1545 [10, 512, 36, 36] float32 synthesis.L0_36_512.affine 262656 - [10, 512] float32 synthesis.L0_36_512 2359808 25 [10, 512, 36, 36] float32 synthesis.L1_36_512.affine 262656 - [10, 512] float32 synthesis.L1_36_512 2359808 25 [10, 512, 36, 36] float32 synthesis.L2_36_512.affine 262656 - [10, 512] float32 synthesis.L2_36_512 2359808 25 [10, 512, 36, 36] float32 synthesis.L3_52_512.affine 262656 - [10, 512] float32 synthesis.L3_52_512 2359808 37 [10, 512, 52, 52] float16 synthesis.L4_52_512.affine 262656 - [10, 512] float32 synthesis.L4_52_512 2359808 25 [10, 512, 52, 52] float16 synthesis.L5_84_512.affine 262656 - [10, 512] float32 synthesis.L5_84_512 2359808 37 [10, 512, 84, 84] float16 synthesis.L6_84_512.affine 262656 - [10, 512] float32 synthesis.L6_84_512 2359808 25 [10, 512, 84, 84] float16 synthesis.L7_148_512.affine 262656 - [10, 512] float32 synthesis.L7_148_512 2359808 37 [10, 512, 148, 148] float16 synthesis.L8_148_512.affine 262656 - [10, 512] float32 synthesis.L8_148_512 2359808 25 [10, 512, 148, 148] float16 synthesis.L9_148_362.affine 262656 - [10, 512] float32 synthesis.L9_148_362 1668458 25 [10, 362, 148, 148] float16 synthesis.L10_276_256.affine 185706 - [10, 362] float32 synthesis.L10_276_256 834304 37 [10, 256, 276, 276] float16 synthesis.L11_276_181.affine 131328 - [10, 256] float32 synthesis.L11_276_181 417205 25 [10, 181, 276, 276] float16 synthesis.L12_276_128.affine 92853 - [10, 181] float32 synthesis.L12_276_128 208640 25 [10, 128, 276, 276] float16 synthesis.L13_256_128.affine 65664 - [10, 128] float32 synthesis.L13_256_128 147584 25 [10, 128, 256, 256] float16 synthesis.L14_256_3.affine 65664 - [10, 128] float32 synthesis.L14_256_3 387 1 [10, 3, 256, 256] float16 synthesis - - [10, 3, 256, 256] float32

Total 28472133 2456 - -

Setting up PyTorch plugin "upfirdn2d_plugin"... Done. Traceback (most recent call last): File "C:\Users\John PC\PycharmProjects\Project\train.py", line 286, in main() # pylint: disable=no-value-for-parameter File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1128, in call return self.main(args, kwargs) File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1053, in main rv = self.invoke(ctx) File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\click\core.py", line 754, in invoke return __callback(args, kwargs) File "C:\Users\John PC\PycharmProjects\Project\train.py", line 281, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "C:\Users\John PC\PycharmProjects\Project\train.py", line 96, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "C:\Users\John PC\PycharmProjects\Project\train.py", line 47, in subprocess_fn training_loop.training_loop(rank=rank, c) File "C:\Users\John PC\PycharmProjects\Project\training\training_loop.py", line 169, in training_loop misc.print_module_summary(D, [img, c]) File "C:\Users\John PC\PycharmProjects\Project\torch_utils\misc.py", line 216, in print_module_summary outputs = module(inputs) File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\torch\nn\modules\module.py", line 1148, in _call_impl result = forward_call(input, kwargs) File "C:\Users\John PC\PycharmProjects\Project\training\networks_stylegan2.py", line 827, in forward x = self.b4(x, img, cmap) File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\torch\nn\modules\module.py", line 1148, in _call_impl result = forward_call(*input, *kwargs) File "C:\Users\John PC\PycharmProjects\Project\training\networks_stylegan2.py", line 750, in forward x = self.mbstd(x) File "C:\ProgramData\Anaconda3\envs\stylegan3\lib\site-packages\torch\nn\modules\module.py", line 1148, in _call_impl result = forward_call(input, kwargs) File "C:\Users\John PC\PycharmProjects\Project\training\networks_stylegan2.py", line 688, in forward y = x.reshape(G, -1, F, c, H, RuntimeError: shape '[4, -1, 1, 512, 4, 4]' is invalid for input of size 81920`

Desktop:

OS: Windows 10 21H2 OS BUILD 19044.1415
PyTorch version 1.12.1
CUDA toolkit version 11.6
NVIDIA driver version 516.94
GPU RTX 3070TI

Those are my conda packages:

Name Version Build Channel

blas 1.0 mkl brotlipy 0.7.0 py39h2bbff1b_1003 ca-certificates 2022.6.15 h5b45459_0 conda-forge certifi 2022.6.15 py39hcbf5309_0 conda-forge cffi 1.15.1 py39h2bbff1b_0 charset-normalizer 2.0.4 pyhd3eb1b0_0 click 8.0.4 py39haa95532_0 colorama 0.4.5 py39haa95532_0 cryptography 37.0.1 py39h21b164f_0 cudatoolkit 11.3.1 h59b6b97_2 freetype 2.10.4 hd328e21_0 glfw 2.2.0 pypi_0 pypi idna 3.3 pyhd3eb1b0_0 imageio-ffmpeg 0.4.3 pypi_0 pypi imgui 1.3.0 pypi_0 pypi intel-openmp 2022.0.0 haa95532_3663 jpeg 9e h2bbff1b_0 lerc 3.0 hd77b12b_0 libblas 3.9.0 16_win64_mkl cctbx202208 libcblas 3.9.0 16_win64_mkl cctbx202208 libdeflate 1.8 h2bbff1b_5 liblapack 3.9.0 16_win64_mkl cctbx202208 libpng 1.6.37 h2a8f88b_0 libtiff 4.4.0 h8a3f274_0 libuv 1.40.0 he774522_0 libwebp 1.2.2 h2bbff1b_0 lz4-c 1.9.3 h2bbff1b_1 m2w64-gcc-libgfortran 5.3.0 6 conda-forge m2w64-gcc-libs 5.3.0 7 conda-forge m2w64-gcc-libs-core 5.3.0 7 conda-forge m2w64-gmp 6.1.0 2 conda-forge m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge mkl 2022.1.0 h6a75c08_874 cctbx202208 msys2-conda-epoch 20160418 1 conda-forge ninja 1.10.2 haa95532_5 ninja-base 1.10.2 h6d14046_5 numpy 1.21.6 py39h6331f09_0 cctbx202208 olefile 0.46 pyhd3eb1b0_0 openssl 1.1.1q h8ffe710_0 conda-forge pillow 8.3.1 py39h8f6046a_0 pip 22.1.2 py39haa95532_0 psutil 5.9.0 py39h2bbff1b_0 pycparser 2.21 pyhd3eb1b0_0 pyopengl 3.1.5 pypi_0 pypi pyopenssl 22.0.0 pyhd3eb1b0_0 pysocks 1.7.1 py39haa95532_0 pyspng 0.1.0 pypi_0 pypi python 3.9.12 h6244533_0 python_abi 3.9 2_cp39 cctbx202208 pytorch 1.12.1 py3.9_cuda11.3_cudnn8_0 pytorch pytorch-mutex 1.0 cuda pytorch requests 2.26.0 pyhd3eb1b0_0 scipy 1.7.1 py39hc0c34ad_0 conda-forge setuptools 63.4.1 py39haa95532_0 sqlite 3.39.2 h2bbff1b_0 tbb 2021.5.0 h2d74725_1 cctbx202208 tk 8.6.12 h2bbff1b_0 tqdm 4.62.2 pyhd3eb1b0_1 typing_extensions 4.3.0 py39haa95532_0 tzdata 2022a hda174b7_0 urllib3 1.26.11 py39haa95532_0 vc 14.2 h21ff451_1 vs2015_runtime 14.27.29016 h5e58377_2 wheel 0.37.1 pyhd3eb1b0_0 win_inet_pton 1.1.0 py39haa95532_0 wincertstore 0.2 py39haa95532_2 xz 5.2.5 h8cc25b3_1 zlib 1.2.12 h8cc25b3_2 zstd 1.5.2 h19a0ad4_0

coxfuture commented 2 years ago

I also ran into this while setting up my environment. I'll have to check when I get home, but I remember it being a versioning thing. Check your versions of pytorch and scipy.

lbq779660843 commented 1 year ago

I meet the same problem and recalculate the G based on G n F c H * W = input_size at line 653 in networks_stylegan2.py. In your case, you can try to change this line like G = 81920/512/4/4 #10.

joinforcookies commented 1 year ago

Try adjusting the batch size. For your code here: python train.py --outdir=training-runs --cfg stylegan3-t --data=datasets/ready --gpus=1 --batch=10 --gamma=8.0 --mirror=1 try to change --batch=10to--batch=12. I had this issue as well when I had forgotten that I changed my batch size from 16 to 14. When I changed it back from 14 to 16, it worked. When you change the batch size, you are changing the number of samples processed simultaneously during each forward and backward pass through the network. This directly affects the dimensions of the input tensor, so changing the batch would allow the tensors to reshape correctly. I hope this helps anyone who's facing this issue.

NVlabs / stylegan3

Training tool crashes #194

Name Version Build Channel