CUDA_ERRIR_OUT_OF_MEMORY

ahmedshingaly commented 4 years ago

thank you alot for this repository and tutorial

I am facing CUDA_OUT_OF_MEMORY

my log is `dnnlib: Running training.training_loop.training_loop() on localhost... C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) Streaming data using training.dataset.TFRecordDataset... Dataset shape = [3, 1024, 1024] Dynamic range = [0, 255] Label size = 0 Loading networks from "results\00001-pretrained\network-snapshot-10000.pkl"... Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Done. Setting up TensorFlow plugin "upfirdn_2d.cu": Preprocessing... Loading... Done.

G Params OutputShape WeightShape

latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 18, 512) -
G_mapping/dlatents_out - (?, 18, 512) -
Truncation/Lerp - (?, 18, 512) -
G_synthesis/dlatents_in - (?, 18, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512) G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512) G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512) G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512) G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512) G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512) G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512) G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512) G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512) G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/128x128/Conv0_up 1442561 (?, 256, 128, 128) (3, 3, 512, 256) G_synthesis/128x128/Conv1 721409 (?, 256, 128, 128) (3, 3, 256, 256) G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 132099 (?, 3, 128, 128) (1, 1, 256, 3)
G_synthesis/256x256/Conv0_up 426369 (?, 128, 256, 256) (3, 3, 256, 128) G_synthesis/256x256/Conv1 213249 (?, 128, 256, 256) (3, 3, 128, 128) G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 66051 (?, 3, 256, 256) (1, 1, 128, 3)
G_synthesis/512x512/Conv0_up 139457 (?, 64, 512, 512) (3, 3, 128, 64) G_synthesis/512x512/Conv1 69761 (?, 64, 512, 512) (3, 3, 64, 64)
G_synthesis/512x512/Upsample - (?, 3, 512, 512) -
G_synthesis/512x512/ToRGB 33027 (?, 3, 512, 512) (1, 1, 64, 3)
G_synthesis/1024x1024/Conv0_up 51297 (?, 32, 1024, 1024) (3, 3, 64, 32)
G_synthesis/1024x1024/Conv1 25665 (?, 32, 1024, 1024) (3, 3, 32, 32)
G_synthesis/1024x1024/Upsample - (?, 3, 1024, 1024) -
G_synthesis/1024x1024/ToRGB 16515 (?, 3, 1024, 1024) (1, 1, 32, 3)
G_synthesis/images_out - (?, 3, 1024, 1024) -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 8, 8) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 16, 16) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 32, 32) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 64, 64) -
G_synthesis/noise8 - (1, 1, 64, 64) -
G_synthesis/noise9 - (1, 1, 128, 128) -
G_synthesis/noise10 - (1, 1, 128, 128) -
G_synthesis/noise11 - (1, 1, 256, 256) -
G_synthesis/noise12 - (1, 1, 256, 256) -
G_synthesis/noise13 - (1, 1, 512, 512) -
G_synthesis/noise14 - (1, 1, 512, 512) -
G_synthesis/noise15 - (1, 1, 1024, 1024) -
G_synthesis/noise16 - (1, 1, 1024, 1024) -
images_out - (?, 3, 1024, 1024) -

Total 30370060

D Params OutputShape WeightShape

images_in - (?, 3, 1024, 1024) -
labels_in - (?, 0) -
1024x1024/FromRGB 128 (?, 32, 1024, 1024) (1, 1, 3, 32)
1024x1024/Conv0 9248 (?, 32, 1024, 1024) (3, 3, 32, 32)
1024x1024/Conv1_down 18496 (?, 64, 512, 512) (3, 3, 32, 64)
1024x1024/Skip 2048 (?, 64, 512, 512) (1, 1, 32, 64)
512x512/Conv0 36928 (?, 64, 512, 512) (3, 3, 64, 64)
512x512/Conv1_down 73856 (?, 128, 256, 256) (3, 3, 64, 128) 512x512/Skip 8192 (?, 128, 256, 256) (1, 1, 64, 128) 256x256/Conv0 147584 (?, 128, 256, 256) (3, 3, 128, 128) 256x256/Conv1_down 295168 (?, 256, 128, 128) (3, 3, 128, 256) 256x256/Skip 32768 (?, 256, 128, 128) (1, 1, 128, 256) 128x128/Conv0 590080 (?, 256, 128, 128) (3, 3, 256, 256) 128x128/Conv1_down 1180160 (?, 512, 64, 64) (3, 3, 256, 512) 128x128/Skip 131072 (?, 512, 64, 64) (1, 1, 256, 512) 64x64/Conv0 2359808 (?, 512, 64, 64) (3, 3, 512, 512) 64x64/Conv1_down 2359808 (?, 512, 32, 32) (3, 3, 512, 512) 64x64/Skip 262144 (?, 512, 32, 32) (1, 1, 512, 512) 32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512) 32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512) 32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512) 16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512) 16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512) 16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512) 8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512) 8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512) 8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512) 4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512) 4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)
scores_out - (?, 1) -

Total 29012513

Building TensorFlow graph... Initializing logs... Training for 25000 kimg...

Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call return fn(*args) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run_training.py", line 201, in main() File "run_training.py", line 196, in main run(vars(args)) File "run_training.py", line 127, in run dnnlib.submit_run(kwargs) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit return run_wrapper(submit_config) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper run_func_obj(*submit_config.run_func_kwargs) File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 302, in training_loop tflib.run(G_train_op, feed_dict) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\tfutil.py", line 31, in run return tf.get_default_session().run(args, **kwargs) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 950, in run run_metadata_ptr) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run run_metadata) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,3,3,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square (defined at :104) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation. Input Source operations connected to node GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square: GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/mul_3 (defined at :100)

Original stack trace for 'GPU0/G_loss/G/G_synthesis/8x8/Conv0_up/Square': File "run_training.py", line 201, in main() File "run_training.py", line 196, in main run(vars(args)) File "run_training.py", line 127, in run dnnlib.submit_run(kwargs) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 343, in submit_run return farm.submit(submit_config, host_run_dir) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\internal\local.py", line 22, in submit return run_wrapper(submit_config) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\submission\submit.py", line 280, in run_wrapper run_func_obj(submit_config.run_func_kwargs) File "C:\Users\USER6459\Documents\python\stylegan2\training\training_loop.py", line 223, in training_loop G_loss, G_reg = dnnlib.util.call_func_by_name(G=G_gpu, D=D_gpu, opt=G_opt, training_set=training_set, minibatch_size=minibatch_gpu_in, G_loss_args) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\util.py", line 256, in call_func_by_name return func_obj(*args, kwargs) File "C:\Users\USER6459\Documents\python\stylegan2\training\loss.py", line 152, in G_logistic_ns_pathreg fake_images_out, fake_dlatents_out = G.get_output_for(latents, labels, is_training=True, return_dlatents=True) File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for out_expr = self._build_func(*final_inputs, *build_kwargs) File "", line 238, in G_main File "C:\Users\USER6459\Documents\python\stylegan2\dnnlib\tflib\network.py", line 221, in get_output_for out_expr = self._build_func(final_inputs, build_kwargs) File "", line 498, in G_synthesis_stylegan2 File "", line 468, in block File "", line 455, in layer File "", line 104, in modulated_conv2d_layer File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 10698, in square "Square", x=x, name=name) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3616, in create_op op_def=op_def) File "C:\ProgramData\Anaconda3\envs\old_tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

` my tfrecordsize is

any idea how to solve this problem thank you in advance

ahmedshingaly commented 4 years ago

here is GPU information of my second computer, I had no luck running your repository on both of them, you replied to my youtube comment by stating that I should have GPU=16GM!, is the below specifications not enough
clip_20200522110041104

dvschultz commented 4 years ago

11GB is probably too small depending on what you’re training, especially is you’re running additional processes on it.

ahmedshingaly commented 4 years ago

I see, thank you very much @dvschultz I have another question

how can I create stylegan model with (1, 18, 512)

my stylegan model is creating shape (1, 12, 512) and I cannot find the latent space developed by Puzer because of shape difference

in more details: my model produce shape with (1, 12, 512) using (https://github.com/NVlabs/stylegan) but when I use stylegan encoder (https://github.com/Puzer/stylegan-encoder) to find latent space it requires (1, 18, 512), do you have any idea how can I produce (1, 18, 512) model shapes instead of (1, 12, 512)?

dvschultz commented 4 years ago

what size output is your model? As I recall only 1024 does (1,18,512). Smaller resolutions will generate smaller shapes. Many of the encoders are only set up to with with FFHQ and its 1024^2 resolutions.

dvschultz / stylegan2

CUDA_ERRIR_OUT_OF_MEMORY #8