Hvass-Labs / TensorFlow-Tutorials

TensorFlow Tutorials with YouTube Videos
MIT License
9.28k stars 4.19k forks source link

out of memory ERROR... #95

Closed jiapei100 closed 5 years ago

jiapei100 commented 5 years ago

What is the lowest level GPU required by this tutorial?

✗ nvidia-smi Sun Dec 30 19:35:43 2018
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 980M Off | 00000000:01:00.0 On | N/A | | N/A 28C P8 7W / N/A | 3834MiB / 4035MiB | 1% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1842 G /usr/lib/xorg/Xorg 15MiB | | 0 1952 G /usr/bin/gnome-shell 48MiB | | 0 2923 G /usr/lib/xorg/Xorg 170MiB | | 0 3039 G /usr/bin/gnome-shell 51MiB | | 0 3801 G ...uest-channel-token=14688497433927674620 166MiB | | 0 23367 C /usr/bin/python 3260MiB | | 0 28276 G ...-token=8CC4669488D477BE118BBC69F71B724E 72MiB | +-----------------------------------------------------------------------------+

✗ python 01_Simple_Linear_Model.py 1.12.0-rc0 Size of:

  • Training-set: 55000
  • Validation-set: 5000
  • Test-set: 10000 2018-12-30 19:24:55.118940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:993] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-12-30 19:24:55.119329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 980M major: 5 minor: 2 memoryClockRate(GHz): 1.1265 pciBusID: 0000:01:00.0 totalMemory: 3.94GiB freeMemory: 59.12MiB 2018-12-30 19:24:55.119345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 Traceback (most recent call last): File "01_Simple_Linear_Model.py", line 303, in session = tf.Session() File "/home/jiapei/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in init super(Session, self).init(target, graph, config=config) File "/home/jiapei/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in init self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts) tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

Cheers Pei

winstonma commented 5 years ago

Hi @jiapei100

I guess @Hvass-Labs uses a GPU card with 8GB memory so I had the same issue before and I decreased the batch_size and rerun the whole notebook fixed my problem.

Hope this helps

Hvass-Labs commented 5 years ago

@winstonma Thanks for answering.

Many of these tutorials were made on a laptop PC with 8 GB of RAM and no GPU. After Tutorial 16 or so, I started using a GTX 1070 with 8 GB of RAM.