Error at Tick 1 : Either Evaluating Metrics or the irreverant alert in pytorch kicks to windows problem reporting

Passingbyposts commented 3 years ago

Describe the bug Crashing at Tick 0

To Reproduce (base) PS C:\Users\Dunwo> conda activate stylegantry (stylegantry) PS C:\Users\Dunwo> cd temp (stylegantry) PS C:\Users\Dunwo\temp> cd .\stylegan2-ada-pytorch\ (stylegantry) PS C:\Users\Dunwo\temp\stylegan2-ada-pytorch> python train.py --data C:\Ganoutput --outdir C:\GanResults

Training options: { "num_gpus": 1, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "metrics": [ "fid50k_full" ], "random_seed": 0, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "C:\Ganoutput", "use_labels": false, "max_size": 13439, "xflip": false, "resolution": 512 }, "data_loader_kwargs": { "pin_memory": true, "num_workers": 3, "prefetch_factor": 2 }, "G_kwargs": { "class_name": "training.networks.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 2 }, "synthesis_kwargs": { "channel_base": 32768, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 } }, "D_kwargs": { "class_name": "training.networks.Discriminator", "block_kwargs": {}, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 4 }, "channel_base": 32768, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 6.5536 }, "total_kimg": 25000, "batch_size": 8, "batch_gpu": 8, "ema_kimg": 2.5, "ema_rampup": 0.05, "ada_target": 0.6, "augment_kwargs": { "class_name": "training.augment.AugmentPipe", "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 }, "run_dir": "C:\GanResults\00014-Ganoutput-auto1" }

Output directory: C:\GanResults\00014-Ganoutput-auto1 Training data: C:\Ganoutput Training duration: 25000 kimg Number of GPUs: 1 Number of images: 13439 Image resolution: 512 Conditional model: False Dataset x-flips: False

Creating output directory... Launching processes... Loading training set...

Num images: 13439 Image shape: [3, 512, 512] Label shape: [0]

Constructing networks... Setting up PyTorch plugin "bias_act_plugin"... Done. Setting up PyTorch plugin "upfirdn2d_plugin"... Done.

Generator Parameters Buffers Output shape Datatype

mapping.fc0 262656 - [8, 512] float32 mapping.fc1 262656 - [8, 512] float32 mapping - 512 [8, 16, 512] float32 synthesis.b4.conv1 2622465 32 [8, 512, 4, 4] float32 synthesis.b4.torgb 264195 - [8, 3, 4, 4] float32 synthesis.b4:0 8192 16 [8, 512, 4, 4] float32 synthesis.b4:1 - - [8, 512, 4, 4] float32 synthesis.b8.conv0 2622465 80 [8, 512, 8, 8] float32 synthesis.b8.conv1 2622465 80 [8, 512, 8, 8] float32 synthesis.b8.torgb 264195 - [8, 3, 8, 8] float32 synthesis.b8:0 - 16 [8, 512, 8, 8] float32 synthesis.b8:1 - - [8, 512, 8, 8] float32 synthesis.b16.conv0 2622465 272 [8, 512, 16, 16] float32 synthesis.b16.conv1 2622465 272 [8, 512, 16, 16] float32 synthesis.b16.torgb 264195 - [8, 3, 16, 16] float32 synthesis.b16:0 - 16 [8, 512, 16, 16] float32 synthesis.b16:1 - - [8, 512, 16, 16] float32 synthesis.b32.conv0 2622465 1040 [8, 512, 32, 32] float32 synthesis.b32.conv1 2622465 1040 [8, 512, 32, 32] float32 synthesis.b32.torgb 264195 - [8, 3, 32, 32] float32 synthesis.b32:0 - 16 [8, 512, 32, 32] float32 synthesis.b32:1 - - [8, 512, 32, 32] float32 synthesis.b64.conv0 2622465 4112 [8, 512, 64, 64] float16 synthesis.b64.conv1 2622465 4112 [8, 512, 64, 64] float16 synthesis.b64.torgb 264195 - [8, 3, 64, 64] float16 synthesis.b64:0 - 16 [8, 512, 64, 64] float16 synthesis.b64:1 - - [8, 512, 64, 64] float32 synthesis.b128.conv0 1442561 16400 [8, 256, 128, 128] float16 synthesis.b128.conv1 721409 16400 [8, 256, 128, 128] float16 synthesis.b128.torgb 132099 - [8, 3, 128, 128] float16 synthesis.b128:0 - 16 [8, 256, 128, 128] float16 synthesis.b128:1 - - [8, 256, 128, 128] float32 synthesis.b256.conv0 426369 65552 [8, 128, 256, 256] float16 synthesis.b256.conv1 213249 65552 [8, 128, 256, 256] float16 synthesis.b256.torgb 66051 - [8, 3, 256, 256] float16 synthesis.b256:0 - 16 [8, 128, 256, 256] float16 synthesis.b256:1 - - [8, 128, 256, 256] float32 synthesis.b512.conv0 139457 262160 [8, 64, 512, 512] float16 synthesis.b512.conv1 69761 262160 [8, 64, 512, 512] float16 synthesis.b512.torgb 33027 - [8, 3, 512, 512] float16 synthesis.b512:0 - 16 [8, 64, 512, 512] float16 synthesis.b512:1 - - [8, 64, 512, 512] float32

Total 28700647 699904 - -

Discriminator Parameters Buffers Output shape Datatype

b512.fromrgb 256 16 [8, 64, 512, 512] float16 b512.skip 8192 16 [8, 128, 256, 256] float16 b512.conv0 36928 16 [8, 64, 512, 512] float16 b512.conv1 73856 16 [8, 128, 256, 256] float16 b512 - 16 [8, 128, 256, 256] float16 b256.skip 32768 16 [8, 256, 128, 128] float16 b256.conv0 147584 16 [8, 128, 256, 256] float16 b256.conv1 295168 16 [8, 256, 128, 128] float16 b256 - 16 [8, 256, 128, 128] float16 b128.skip 131072 16 [8, 512, 64, 64] float16 b128.conv0 590080 16 [8, 256, 128, 128] float16 b128.conv1 1180160 16 [8, 512, 64, 64] float16 b128 - 16 [8, 512, 64, 64] float16 b64.skip 262144 16 [8, 512, 32, 32] float16 b64.conv0 2359808 16 [8, 512, 64, 64] float16 b64.conv1 2359808 16 [8, 512, 32, 32] float16 b64 - 16 [8, 512, 32, 32] float16 b32.skip 262144 16 [8, 512, 16, 16] float32 b32.conv0 2359808 16 [8, 512, 32, 32] float32 b32.conv1 2359808 16 [8, 512, 16, 16] float32 b32 - 16 [8, 512, 16, 16] float32 b16.skip 262144 16 [8, 512, 8, 8] float32 b16.conv0 2359808 16 [8, 512, 16, 16] float32 b16.conv1 2359808 16 [8, 512, 8, 8] float32 b16 - 16 [8, 512, 8, 8] float32 b8.skip 262144 16 [8, 512, 4, 4] float32 b8.conv0 2359808 16 [8, 512, 8, 8] float32 b8.conv1 2359808 16 [8, 512, 4, 4] float32 b8 - 16 [8, 512, 4, 4] float32 b4.mbstd - - [8, 513, 4, 4] float32 b4.conv 2364416 16 [8, 512, 4, 4] float32 b4.fc 4194816 - [8, 512] float32 b4.out 513 - [8, 1] float32

Total 28982849 480 - -

Setting up augmentation... Distributing across 1 GPUs... Setting up training phases... Exporting sample images... Initializing logs... Training for 25000 kimg...

tick 0 kimg 0.0 time 51s sec/tick 6.2 sec/kimg 773.03 maintenance 44.9 cpumem 3.61 gpumem 14.76 augment 0.000 Evaluating metrics... C:\Users\Dunwo\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.) return forward_call(*input, **kwargs) (stylegantry) PS C:\Users\Dunwo\temp\stylegan2-ada-pytorch>

Please copy&paste text instead of screenshots for better searchability.

Expected behavior At this stage im expecting gpu usage to ramp up and ticks 1 and more to follow. i dont think there should be any windows problem reporting

Screenshots It generates the first tick and log ceyp538d2co71 but its hard to tell if its when it begins evaluating metrics or when the irreverant warning comes up Desktop (please complete the following information): niiqqtkn2co71 As soon as it gets here there will be a windows problem reporting in the task maanger. but there is no pop up or alert or anything and then there is nothing. no debugs, no errors its like its been aborted

OS: [ Windows 10]
PyTorch version (1.9.0)
CUDA toolkit version (e.g., CUDA 11.1)
NVIDIA driver version 471.96
GPU [ RTX 3090]
Docker: Did not use docker

Additional context I'm new to this but willing to learn and not afraid to google my own problems and troubleshoot. the issue here is there is no debug or error alert at all. so i have nothing to go on

wexin-c commented 3 years ago

I encountered the same mistake.

Loading training set...

Num images: 2008 Image shape: [3, 512, 512] Label shape: [0]

Constructing networks... Setting up PyTorch plugin "bias_act_plugin"... D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte warnings.warn(f'Error checking compiler version for {compiler}: {error}') Done. Setting up PyTorch plugin "upfirdn2d_plugin"... D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte warnings.warn(f'Error checking compiler version for {compiler}: {error}') Done.

Generator Parameters Buffers Output shape Datatype

mapping.fc0 262656 - [8, 512] float32 mapping.fc1 262656 - [8, 512] float32 mapping - 512 [8, 16, 512] float32 synthesis.b4.conv1 2622465 32 [8, 512, 4, 4] float32 synthesis.b4.torgb 264195 - [8, 3, 4, 4] float32 synthesis.b4:0 8192 16 [8, 512, 4, 4] float32 synthesis.b4:1 - - [8, 512, 4, 4] float32 synthesis.b8.conv0 2622465 80 [8, 512, 8, 8] float32 synthesis.b8.conv1 2622465 80 [8, 512, 8, 8] float32 synthesis.b8.torgb 264195 - [8, 3, 8, 8] float32 synthesis.b8:0 - 16 [8, 512, 8, 8] float32 synthesis.b8:1 - - [8, 512, 8, 8] float32 synthesis.b16.conv0 2622465 272 [8, 512, 16, 16] float32 synthesis.b16.conv1 2622465 272 [8, 512, 16, 16] float32 synthesis.b16.torgb 264195 - [8, 3, 16, 16] float32 synthesis.b16:0 - 16 [8, 512, 16, 16] float32 synthesis.b16:1 - - [8, 512, 16, 16] float32 synthesis.b32.conv0 2622465 1040 [8, 512, 32, 32] float32 synthesis.b32.conv1 2622465 1040 [8, 512, 32, 32] float32 synthesis.b32.torgb 264195 - [8, 3, 32, 32] float32 synthesis.b32:0 - 16 [8, 512, 32, 32] float32 synthesis.b32:1 - - [8, 512, 32, 32] float32 synthesis.b64.conv0 2622465 4112 [8, 512, 64, 64] float16 synthesis.b64.conv1 2622465 4112 [8, 512, 64, 64] float16 synthesis.b64.torgb 264195 - [8, 3, 64, 64] float16 synthesis.b64:0 - 16 [8, 512, 64, 64] float16 synthesis.b64:1 - - [8, 512, 64, 64] float32 synthesis.b128.conv0 1442561 16400 [8, 256, 128, 128] float16 synthesis.b128.conv1 721409 16400 [8, 256, 128, 128] float16 synthesis.b128.torgb 132099 - [8, 3, 128, 128] float16 synthesis.b128:0 - 16 [8, 256, 128, 128] float16 synthesis.b128:1 - - [8, 256, 128, 128] float32 synthesis.b256.conv0 426369 65552 [8, 128, 256, 256] float16 synthesis.b256.conv1 213249 65552 [8, 128, 256, 256] float16 synthesis.b256.torgb 66051 - [8, 3, 256, 256] float16 synthesis.b256:0 - 16 [8, 128, 256, 256] float16 synthesis.b256:1 - - [8, 128, 256, 256] float32 synthesis.b512.conv0 139457 262160 [8, 64, 512, 512] float16 synthesis.b512.conv1 69761 262160 [8, 64, 512, 512] float16 synthesis.b512.torgb 33027 - [8, 3, 512, 512] float16 synthesis.b512:0 - 16 [8, 64, 512, 512] float16 synthesis.b512:1 - - [8, 64, 512, 512] float32

Total 28700647 699904 - -

Discriminator Parameters Buffers Output shape Datatype

b512.fromrgb 256 16 [8, 64, 512, 512] float16 b512.skip 8192 16 [8, 128, 256, 256] float16 b512.conv0 36928 16 [8, 64, 512, 512] float16 b512.conv1 73856 16 [8, 128, 256, 256] float16 b512 - 16 [8, 128, 256, 256] float16 b256.skip 32768 16 [8, 256, 128, 128] float16 b256.conv0 147584 16 [8, 128, 256, 256] float16 b256.conv1 295168 16 [8, 256, 128, 128] float16 b256 - 16 [8, 256, 128, 128] float16 b128.skip 131072 16 [8, 512, 64, 64] float16 b128.conv0 590080 16 [8, 256, 128, 128] float16 b128.conv1 1180160 16 [8, 512, 64, 64] float16 b128 - 16 [8, 512, 64, 64] float16 b64.skip 262144 16 [8, 512, 32, 32] float16 b64.conv0 2359808 16 [8, 512, 64, 64] float16 b64.conv1 2359808 16 [8, 512, 32, 32] float16 b64 - 16 [8, 512, 32, 32] float16 b32.skip 262144 16 [8, 512, 16, 16] float32 b32.conv0 2359808 16 [8, 512, 32, 32] float32 b32.conv1 2359808 16 [8, 512, 16, 16] float32 b32 - 16 [8, 512, 16, 16] float32 b16.skip 262144 16 [8, 512, 8, 8] float32 b16.conv0 2359808 16 [8, 512, 16, 16] float32 b16.conv1 2359808 16 [8, 512, 8, 8] float32 b16 - 16 [8, 512, 8, 8] float32 b8.skip 262144 16 [8, 512, 4, 4] float32 b8.conv0 2359808 16 [8, 512, 8, 8] float32 b8.conv1 2359808 16 [8, 512, 4, 4] float32 b8 - 16 [8, 512, 4, 4] float32 b4.mbstd - - [8, 513, 4, 4] float32 b4.conv 2364416 16 [8, 512, 4, 4] float32 b4.fc 4194816 - [8, 512] float32 b4.out 513 - [8, 1] float32

Total 28982849 480 - -

Setting up augmentation... Distributing across 1 GPUs... Setting up training phases... Exporting sample images... Initializing logs... Skipping tfevents export: No module named 'tensorboard' Training for 25000 kimg...

tick 0 kimg 0.0 time 1m 07s sec/tick 7.6 sec/kimg 3792.39 maintenance 59.2 cpumem 3.80 gpumem 10.22 augment 0.000 Evaluating metrics... D:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.) return forward_call(*input, kwargs) Traceback (most recent call last): File "E:\Pytorch\stylegan2-ada-pytorch-main\train.py", line 539, in main() # pylint: disable=no-value-for-parameter File "D:\anaconda\envs\pytorch\lib\site-packages\click\core.py", line 1137, in call return self.main(args, kwargs) File "D:\anaconda\envs\pytorch\lib\site-packages\click\core.py", line 1062, in main rv = self.invoke(ctx) File "D:\anaconda\envs\pytorch\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "D:\anaconda\envs\pytorch\lib\site-packages\click\core.py", line 763, in invoke return __callback(args, kwargs) File "D:\anaconda\envs\pytorch\lib\site-packages\click\decorators.py", line 26, in new_func return f(get_current_context(), *args, kwargs) File "E:\Pytorch\stylegan2-ada-pytorch-main\train.py", line 532, in main subprocess_fn(rank=0, args=args, temp_dir=temp_dir) File "E:\Pytorch\stylegan2-ada-pytorch-main\train.py", line 384, in subprocess_fn training_loop.training_loop(rank=rank, args) File "E:\Pytorch\stylegan2-ada-pytorch-main\training\training_loop.py", line 374, in training_loop result_dict = metric_main.calc_metric(metric=metric, G=snapshot_data['G_ema'], File "E:\Pytorch\stylegan2-ada-pytorch-main\metrics\metric_main.py", line 45, in calc_metric results = _metric_dictmetric File "E:\Pytorch\stylegan2-ada-pytorch-main\metrics\metric_main.py", line 85, in fid50k_full fid = frechet_inception_distance.compute_fid(opts, max_real=None, num_gen=50000) File "E:\Pytorch\stylegan2-ada-pytorch-main\metrics\frechet_inception_distance.py", line 25, in compute_fid mu_real, sigma_real = metric_utils.compute_feature_stats_for_dataset( File "E:\Pytorch\stylegan2-ada-pytorch-main\metrics\metric_utils.py", line 218, in compute_feature_stats_for_dataset features = detector(images.to(opts.device), *detector_kwargs) File "D:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(input, **kwargs) RuntimeError: MALFORMED INPUT: lanes dont match

What is the structure of your training set pictures？

wexin-c commented 3 years ago

(pytorch) E:\Pytorch\stylegan2-ada-pytorch-main>python train.py --outdir=~/training-runs --data ./dataset/Black_PP_out.zip --gpus=1

Training options: { "num_gpus": 1, "image_snapshot_ticks": 50, "network_snapshot_ticks": 50, "metrics": [ "fid50k_full" ], "random_seed": 0, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "./dataset/Black_PP_out.zip", "use_labels": false, "max_size": 2008, "xflip": false, "resolution": 512 }, "data_loader_kwargs": { "pin_memory": true, "num_workers": 0, "prefetch_factor": 2 }, "G_kwargs": { "class_name": "training.networks.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 2 }, "synthesis_kwargs": { "channel_base": 32768, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 } }, "D_kwargs": { "class_name": "training.networks.Discriminator", "block_kwargs": {}, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 4 }, "channel_base": 32768, "channel_max": 512, "num_fp16_res": 4, "conv_clamp": 256 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "lr": 0.0025, "betas": [ 0, 0.99 ], "eps": 1e-08 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 6.5536 }, "total_kimg": 25000, "batch_size": 2, "batch_gpu": 8, "ema_kimg": 2.5, "ema_rampup": 0.05, "ada_target": 0.6, "augment_kwargs": { "class_name": "training.augment.AugmentPipe", "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 }, "run_dir": "~/training-runs\00000-Black_PP_out-auto1" }

Output directory: ~/training-runs\00000-Black_PP_out-auto1 Training data: ./dataset/Black_PP_out.zip Training duration: 25000 kimg Number of GPUs: 1 Number of images: 2008 Image resolution: 512 Conditional model: False Dataset x-flips: False

Creating output directory... Launching processes... Loading training set...

Num images: 2008 Image shape: [3, 512, 512] Label shape: [0]

Constructing networks... Setting up PyTorch plugin "bias_act_plugin"... D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte warnings.warn(f'Error checking compiler version for {compiler}: {error}') Done. Setting up PyTorch plugin "upfirdn2d_plugin"... D:\anaconda\envs\pytorch\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte warnings.warn(f'Error checking compiler version for {compiler}: {error}') Done.

Generator Parameters Buffers Output shape Datatype

mapping.fc0 262656 - [8, 512] float32 mapping.fc1 262656 - [8, 512] float32 mapping - 512 [8, 16, 512] float32 synthesis.b4.conv1 2622465 32 [8, 512, 4, 4] float32 synthesis.b4.torgb 264195 - [8, 3, 4, 4] float32 synthesis.b4:0 8192 16 [8, 512, 4, 4] float32 synthesis.b4:1 - - [8, 512, 4, 4] float32 synthesis.b8.conv0 2622465 80 [8, 512, 8, 8] float32 synthesis.b8.conv1 2622465 80 [8, 512, 8, 8] float32 synthesis.b8.torgb 264195 - [8, 3, 8, 8] float32 synthesis.b8:0 - 16 [8, 512, 8, 8] float32 synthesis.b8:1 - - [8, 512, 8, 8] float32 synthesis.b16.conv0 2622465 272 [8, 512, 16, 16] float32 synthesis.b16.conv1 2622465 272 [8, 512, 16, 16] float32 synthesis.b16.torgb 264195 - [8, 3, 16, 16] float32 synthesis.b16:0 - 16 [8, 512, 16, 16] float32 synthesis.b16:1 - - [8, 512, 16, 16] float32 synthesis.b32.conv0 2622465 1040 [8, 512, 32, 32] float32 synthesis.b32.conv1 2622465 1040 [8, 512, 32, 32] float32 synthesis.b32.torgb 264195 - [8, 3, 32, 32] float32 synthesis.b32:0 - 16 [8, 512, 32, 32] float32 synthesis.b32:1 - - [8, 512, 32, 32] float32 synthesis.b64.conv0 2622465 4112 [8, 512, 64, 64] float16 synthesis.b64.conv1 2622465 4112 [8, 512, 64, 64] float16 synthesis.b64.torgb 264195 - [8, 3, 64, 64] float16 synthesis.b64:0 - 16 [8, 512, 64, 64] float16 synthesis.b64:1 - - [8, 512, 64, 64] float32 synthesis.b128.conv0 1442561 16400 [8, 256, 128, 128] float16 synthesis.b128.conv1 721409 16400 [8, 256, 128, 128] float16 synthesis.b128.torgb 132099 - [8, 3, 128, 128] float16 synthesis.b128:0 - 16 [8, 256, 128, 128] float16 synthesis.b128:1 - - [8, 256, 128, 128] float32 synthesis.b256.conv0 426369 65552 [8, 128, 256, 256] float16 synthesis.b256.conv1 213249 65552 [8, 128, 256, 256] float16 synthesis.b256.torgb 66051 - [8, 3, 256, 256] float16 synthesis.b256:0 - 16 [8, 128, 256, 256] float16 synthesis.b256:1 - - [8, 128, 256, 256] float32 synthesis.b512.conv0 139457 262160 [8, 64, 512, 512] float16 synthesis.b512.conv1 69761 262160 [8, 64, 512, 512] float16 synthesis.b512.torgb 33027 - [8, 3, 512, 512] float16 synthesis.b512:0 - 16 [8, 64, 512, 512] float16 synthesis.b512:1 - - [8, 64, 512, 512] float32

Total 28700647 699904 - -

Discriminator Parameters Buffers Output shape Datatype

b512.fromrgb 256 16 [8, 64, 512, 512] float16 b512.skip 8192 16 [8, 128, 256, 256] float16 b512.conv0 36928 16 [8, 64, 512, 512] float16 b512.conv1 73856 16 [8, 128, 256, 256] float16 b512 - 16 [8, 128, 256, 256] float16 b256.skip 32768 16 [8, 256, 128, 128] float16 b256.conv0 147584 16 [8, 128, 256, 256] float16 b256.conv1 295168 16 [8, 256, 128, 128] float16 b256 - 16 [8, 256, 128, 128] float16 b128.skip 131072 16 [8, 512, 64, 64] float16 b128.conv0 590080 16 [8, 256, 128, 128] float16 b128.conv1 1180160 16 [8, 512, 64, 64] float16 b128 - 16 [8, 512, 64, 64] float16 b64.skip 262144 16 [8, 512, 32, 32] float16 b64.conv0 2359808 16 [8, 512, 64, 64] float16 b64.conv1 2359808 16 [8, 512, 32, 32] float16 b64 - 16 [8, 512, 32, 32] float16 b32.skip 262144 16 [8, 512, 16, 16] float32 b32.conv0 2359808 16 [8, 512, 32, 32] float32 b32.conv1 2359808 16 [8, 512, 16, 16] float32 b32 - 16 [8, 512, 16, 16] float32 b16.skip 262144 16 [8, 512, 8, 8] float32 b16.conv0 2359808 16 [8, 512, 16, 16] float32 b16.conv1 2359808 16 [8, 512, 8, 8] float32 b16 - 16 [8, 512, 8, 8] float32 b8.skip 262144 16 [8, 512, 4, 4] float32 b8.conv0 2359808 16 [8, 512, 8, 8] float32 b8.conv1 2359808 16 [8, 512, 4, 4] float32 b8 - 16 [8, 512, 4, 4] float32 b4.mbstd - - [8, 513, 4, 4] float32 b4.conv 2364416 16 [8, 512, 4, 4] float32 b4.fc 4194816 - [8, 512] float32 b4.out 513 - [8, 1] float32

Total 28982849 480 - -

Setting up augmentation... Distributing across 1 GPUs... Setting up training phases... Exporting sample images... Initializing logs... Skipping tfevents export: No module named 'tensorboard' Training for 25000 kimg...

tick 0 kimg 0.0 time 1m 02s sec/tick 7.5 sec/kimg 3749.52 maintenance 54.4 cpumem 3.80 gpumem 10.22 augment 0.000 Evaluating metrics... D:\anaconda\envs\pytorch\lib\site-packages\torch\nn\modules\module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them fo r anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.) return forward_call(*input, **kwargs)

Passingbyposts commented 3 years ago

Hi Wexin

Im using 512 by 512 and the images generated a training set via the python dataset_tool.py --source C:\Ganinput --dest C:\Ganoutput

Which has sorted and created the right foldering levels, and it even generates the first pkl, real and fake images it just crashes at tick 1 without any errors or debug

That said i ran it the once in the first attempt and never recreated it as i thought it would be fine. is it a common place for errors?

I got this error 'RuntimeError: MALFORMED INPUT: lanes dont match' once in the dozens of uninstalls and re-installs i did but i was only the one where i was trying to run an older pytorch to see if that fixed it

wexin-c commented 3 years ago

I tried the old pytorch version. It works.

wexin-c commented 3 years ago

But there are still many problems. such as
How to set pre training weight I'm a new to this. Can you leave a contact information for communication each others

Passingbyposts commented 3 years ago

Aha! going down to 1.80 like in your screenshot seems to have worked https://github.com/NVlabs/stylegan2-ada-pytorch/issues/182#issuecomment-922429521 i guess the irreverent message in the later version is throwing a werfault.exe from windows which kills the progress

Only question i have last is that it its running but it doesnt seem to be a large load on my 3090 graphics card? Is this low a load normal for others ?

wexin-c commented 3 years ago

I am new to this .How do you set your batchsize and worknum？

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年9月19日(星期天) 下午4:22 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [NVlabs/stylegan2-ada-pytorch] Error at Tick 1 : Either Evaluating Metrics or the irreverant alert in pytorch kicks to windows problem reporting (#182)

Aha! going down to 1.80 like in your screenshot seems to have worked i guess the irreverent message in the later version is throwing a werfault.exe from windows which kills the progress

Only question i have last is that it its running but it doesnt seem to be a large load on my 3090 graphics card? Is this low a load normal for others ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

wexin-c commented 3 years ago

this my set Batch size= 2 worknum=0 It seems that it takes a long time to start training again! I think there should be a pre training weight to speed up the training?

wexin-c commented 3 years ago

Passingbyposts commented 3 years ago

Not sure havent found an active community to discuss things right now Currently in https://discord.gg/learnaitogether till i find something more

Current Issue is resolved though

SOLUTION : If you get the werfault.exe error this appears to be driven by the pytorch 1.82 and above on 30 series

install previous versions instead 1.80 worked for me and solver

'conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge'

NVlabs / stylegan2-ada-pytorch

Error at Tick 1 : Either Evaluating Metrics or the irreverant alert in pytorch kicks to windows problem reporting #182