neqkir commented 3 years ago

A RNN code running well on the CPU, on the GPU getting this apparently "out of memory" error.

I run the code here https://github.com/neqkir/bible-like-text-generation/blob/main/word-based/word_rnn_bible_lstm.py

Iteration 1
2021-10-15 10:43:35.330781: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 43731715200 exceeds 10% of free system memory.
2021-10-15 10:43:51.203149: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 40.73GiB (rounded to 43731715328)requested by op _EagerConst
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2021-10-15 10:43:51.203277: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] BFCAllocator dump for GPU_0_bfc
2021-10-15 10:43:51.203290: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (256):  Total Chunks: 14, Chunks in use: 14. 3.5KiB allocated for chunks. 3.5KiB in use in bin. 64B client-requested in use in bin.
2021-10-15 10:43:51.203311: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (512):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203343: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2021-10-15 10:43:51.203361: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2048): Total Chunks: 1, Chunks in use: 0. 2.8KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203370: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4096): Total Chunks: 1, Chunks in use: 0. 7.5KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203378: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8192): Total Chunks: 2, Chunks in use: 2. 16.0KiB allocated for chunks. 16.0KiB in use in bin. 16.0KiB client-requested in use in bin.
2021-10-15 10:43:51.203385: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203391: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (32768): Total Chunks: 1, Chunks in use: 1. 63.5KiB allocated for chunks. 63.5KiB in use in bin. 63.4KiB client-requested in use in bin.
2021-10-15 10:43:51.203397: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203404: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (131072):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203410: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (262144):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203417: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (524288):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203424: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (1048576):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203430: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (2097152):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203437: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (4194304):       Total Chunks: 4, Chunks in use: 3. 19.91MiB allocated for chunks. 14.98MiB in use in bin. 12.00MiB client-requested in use in bin.
2021-10-15 10:43:51.203443: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (8388608):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203450: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (16777216):      Total Chunks: 1, Chunks in use: 1. 31.68MiB allocated for chunks. 31.68MiB in use in bin. 31.68MiB client-requested in use in bin.
2021-10-15 10:43:51.203456: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (33554432):      Total Chunks: 1, Chunks in use: 0. 63.36MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203464: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (67108864):      Total Chunks: 1, Chunks in use: 1. 126.72MiB allocated for chunks. 126.72MiB in use in bin. 126.72MiB client-requested in use in bin.
2021-10-15 10:43:51.203473: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (134217728):     Total Chunks: 1, Chunks in use: 0. 138.40MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203481: I tensorflow/core/common_runtime/bfc_allocator.cc:1011] Bin (268435456):     Total Chunks: 1, Chunks in use: 0. 30.62GiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-10-15 10:43:51.203490: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Bin for 40.73GiB was 256.00MiB, Chunk State:
2021-10-15 10:43:51.203506: I tensorflow/core/common_runtime/bfc_allocator.cc:1033]   Size: 30.62GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 126.72MiB | Requested Size: 126.72MiB | in_use: 1 | bin_num: -1
2021-10-15 10:43:51.203512: I tensorflow/core/common_runtime/bfc_allocator.cc:1040] Next region of size 33281802240
2021-10-15 10:43:51.203520: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000000 of size 256 next 1
2021-10-15 10:43:51.203525: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000100 of size 1280 next 2
2021-10-15 10:43:51.203531: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000600 of size 256 next 3
2021-10-15 10:43:51.203536: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000700 of size 256 next 4
2021-10-15 10:43:51.203542: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000800 of size 256 next 5
2021-10-15 10:43:51.203547: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000900 of size 256 next 6
2021-10-15 10:43:51.203552: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000a00 of size 256 next 9
2021-10-15 10:43:51.203561: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000b00 of size 256 next 12
2021-10-15 10:43:51.203567: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540000c00 of size 256 next 10
2021-10-15 10:43:51.203575: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free  at 7fe540000d00 of size 7680 next 15
2021-10-15 10:43:51.203582: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540002b00 of size 8192 next 16
2021-10-15 10:43:51.203588: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540004b00 of size 256 next 20
2021-10-15 10:43:51.203594: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540004c00 of size 256 next 17
2021-10-15 10:43:51.203600: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540004d00 of size 256 next 26
2021-10-15 10:43:51.203605: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540004e00 of size 256 next 27
2021-10-15 10:43:51.203611: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540004f00 of size 256 next 28
2021-10-15 10:43:51.203618: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free  at 7fe540005000 of size 2816 next 19
2021-10-15 10:43:51.203624: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540005b00 of size 8192 next 21
2021-10-15 10:43:51.203629: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540007b00 of size 65024 next 23
2021-10-15 10:43:51.203639: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free  at 7fe540017900 of size 5165568 next 22
2021-10-15 10:43:51.203646: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540504b00 of size 7323392 next 11
2021-10-15 10:43:51.203652: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540c00a00 of size 256 next 14
2021-10-15 10:43:51.203659: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe540c00b00 of size 4194304 next 13
2021-10-15 10:43:51.203664: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe541000b00 of size 4194304 next 18
2021-10-15 10:43:51.203670: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free  at 7fe541400b00 of size 66437120 next 25
2021-10-15 10:43:51.203676: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe54535cb00 of size 33218560 next 24
2021-10-15 10:43:51.203682: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free  at 7fe54730ab00 of size 145120768 next 7
2021-10-15 10:43:51.203690: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] InUse at 7fe54fd70900 of size 132874240 next 8
2021-10-15 10:43:51.203697: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] Free  at 7fe557c28900 of size 32883177216 next 18446744073709551615
2021-10-15 10:43:51.203703: I tensorflow/core/common_runtime/bfc_allocator.cc:1065]      Summary of in-use Chunks by size:
2021-10-15 10:43:51.203711: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 14 Chunks of size 256 totalling 3.5KiB
2021-10-15 10:43:51.203717: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 1280 totalling 1.2KiB
2021-10-15 10:43:51.203724: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 2 Chunks of size 8192 totalling 16.0KiB
2021-10-15 10:43:51.203729: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 65024 totalling 63.5KiB
2021-10-15 10:43:51.203736: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 2 Chunks of size 4194304 totalling 8.00MiB
2021-10-15 10:43:51.203742: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 7323392 totalling 6.98MiB
2021-10-15 10:43:51.203747: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 33218560 totalling 31.68MiB
2021-10-15 10:43:51.203756: I tensorflow/core/common_runtime/bfc_allocator.cc:1068] 1 Chunks of size 132874240 totalling 126.72MiB
2021-10-15 10:43:51.203763: I tensorflow/core/common_runtime/bfc_allocator.cc:1072] Sum Total of in-use chunks: 173.46MiB
2021-10-15 10:43:51.203769: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 33281802240 memory_limit_: 33281802240 available bytes: 0 curr_region_allocation_bytes_: 66563604480
2021-10-15 10:43:51.203781: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit:                     33281802240
InUse:                       181891072
MaxInUse:                    398624768
NumAllocs:                          56
MaxAllocSize:                132874240
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-10-15 10:43:51.203793: W tensorflow/core/common_runtime/bfc_allocator.cc:468] **__________________________________________________________________________________________________
Traceback (most recent call last):
  File "rnn.py", line 104, in <module>
    model.fit(X, y, batch_size=128, epochs = 42)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1148, in fit
    steps_per_execution=self._steps_per_execution)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/data_adapter.py", line 1383, in get_data_handler
    return DataHandler(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/data_adapter.py", line 1150, in __init__
    model=model)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/data_adapter.py", line 230, in __init__
    x, y, sample_weights = _process_tensorlike((x, y, sample_weights))
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/data_adapter.py", line 1031, in _process_tensorlike
    inputs = tf.nest.map_structure(_convert_numpy_and_scipy, inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 869, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 869, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/data_adapter.py", line 1026, in _convert_numpy_and_scipy
    return tf.convert_to_tensor(x, dtype=dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1431, in convert_to_tensor_v2_with_dispatch
    value, dtype=dtype, dtype_hint=dtype_hint, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1441, in convert_to_tensor_v2
    as_ref=False)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/profiler/trace.py", line 163, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1566, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 52, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 272, in constant
    allow_broadcast=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 283, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 308, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py", line 106, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
**tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.**

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): in Keras/Tensorflow
OS Platform and Distribution : Linux
TensorFlow version (use command below):2.6.0
Python version:3.6
GPU model and memory: 2x MI100

deven-amd commented 3 years ago

As per the log posted, the model seems to be attempting to allocate 40GB!! of GPU memory

2021-10-15 10:43:51.203149: W tensorflow/core/common_runtime/bfc_allocator.cc:457] Allocator (GPU_0_bfc) ran out of memory trying to allocate 40.73GiB (rounded to 43731715328)requested by op _EagerConst

Need to figure out what is triggering this memory allocation.

@neqkir , can I request you to rerun with the env var "MIOPEN_ENABLE_LOGGING=1" and then post the resulting log file here?

thanks

jayfurmanek commented 3 years ago

It looks to me that your arrays you are working with are based on the entire corpus. When using an accelerator you have to think a bit more about how to handle your data, if you have large amounts of it.

Your MI100s have 32G of memory onboard, so you are blowing past that. It also looks like you have two so you'll want to be able to make the most of both of them.

Take a look at a few of these guides to help you refactor your code a bit for GPU acceleration. You will likely want to create a tf.data dataset and use a mirrored strategy to make the best use of your hardware.

https://www.tensorflow.org/guide/gpu https://www.tensorflow.org/guide/distributed_training https://www.tensorflow.org/guide/data

reyhaneh-92 commented 2 years ago

I am facing the same issue, @neqkir were you able to solve it? I would appreciate it if you can post your solution here.

bilal-umar commented 1 year ago

@neqkir @reyhaneh-92 i am facing same issue, please update here if you were able to solve

mdtalibahmad commented 1 year ago

I am also facing the same issue. Moreover, the same code was working with TensorFlow 2.4 and throwing this after I upgraded TensorFlow to 2.10.

bilal-umar commented 1 year ago

@mdtalibahmad I was able to resolve the issue after completely uninstalling cuda, python and all dependencies and reinstalling everything with correct version. even installed update for vcpp for latest cuda.

MaoMakara commented 1 year ago

I'm also the same with this problem, when I use my soft code run in local computer using jupyter notebook it working but when I move my soft code to run in server and the same env but I got error. Pls help me to resolved this error thank you for your valuable time.

MaoMakara commented 1 year ago

Use this code after load your libraries for increase your GPU read batch size:

gpus = tf.config.list_physical_devices('GPU') if gpus: try:

Currently, memory growth needs to be the same across GPUs

for gpu in gpus:
  tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")

except RuntimeError as e:

Memory growth must be set before GPUs have been initialized

print(e)

KerimM-bit commented 1 year ago

@mdtalibahmad I was able to resolve the issue after completely uninstalling cuda, python and all dependencies and reinstalling everything with correct version. even installed update for vcpp for latest cuda.

hello, what is the correct version? I'm using python 3.8, tf 2.10, cuda 11.2, cudnn 8.1.0, and still got the same issue, could you elaborate on which version is worked for you?

MaryamOstadsharif commented 1 year ago

@mdtalibahmad I was able to resolve the issue after completely uninstalling cuda, python and all dependencies and reinstalling everything with correct version. even installed update for vcpp for latest cuda.

please write the versions of all dependencies work correctly for you.

mervess commented 9 months ago

Modifying the batch size fixed the error for me.

PetroleumEngineer commented 9 months ago

Modifying the batch size fixed the error for me.

Did you reduced or increased the batch size ?

mervess commented 9 months ago

Did you reduced or increased the batch size ?

Increased it.

ROCm / tensorflow-upstream

tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized. #1469

Currently, memory growth needs to be the same across GPUs

Memory growth must be set before GPUs have been initialized