Closed airdine closed 4 years ago
You don't specify if you are using GPU or CPU.
By default, deeper-stacker uses GPU. You will need to have enough memory in your GPU to make it work.
Could you share the output of the following commands?
$ nvcc --version
$ nvidia-smi
Hey, thanks for your reply !
I was using GPU and didn't noticed it because as your said deeper-stack use it by default, sorry.
here are the config output :
$ lsb_release -a
LSB Version: core-9.20170808ubuntu1-noarch:security-9.20170808ubuntu1-noarch
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:29_CDT_2018
Cuda compilation tools, release 9.2, V9.2.88
$ nvidia-smi
Sat Apr 4 19:57:07 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+========= ============|
| 0 GeForce GTX 106... Off | 00000000:65:00.0 On | N/A |
| 0% 51C P0 28W / 120W | 179MiB / 3016MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1410 G /usr/lib/xorg/Xorg 18MiB |
| 0 1446 G /usr/bin/gnome-shell 9MiB |
| 0 2605 G /usr/lib/xorg/Xorg 71MiB |
| 0 2698 G /usr/bin/gnome-shell 76MiB |
+-----------------------------------------------------------------------------+
I think your GPU doesn't have enough memory.
$ nvidia-smi -l 1
Launch this in one terminal and in the other terminal, launch the neuronal network training. You will need to observe if it fills all the memory and when that happens, the lua script fails.
Thanks you,
I'll try that as soon as I have my data generated with me and I'll post the result.
I'll try with CPU too and comment the result if I can.
I'll just try without generating more data and I won't use CPU. 👍
Hello,
Here is my feedback: I reduced the amount of data generated files and run
$ th Training/main_train.lua 4
Loading Net Builder
103328 all good files
Erreur de segmentation (core dumped)
It's still the same. While running training part :
$ nvidia-smi -l 1
Memory-usage get up until 981MiB /3016MiB then the training part stop and Memory-Usage get back to 300MiB (gdm3 usage)
Do you still think it's an out of memory issue ?
Thank you for you interest.
Did you check this issue? Maybe it could be related with your problems: https://github.com/happypepper/DeepHoldem/issues/8
Hello, thank for the link,
I didn't found something helping my issue except having a fresh OS install but not tried yet.
after looking at this comment : https://github.com/happypepper/DeepHoldem/issues/8#issuecomment-466355094
Which OS do you use to run this ?
Thank you for your advises
Try Ubuntu 16.04 OS or nvidia-docker image with 16.04.
Hey,
Fresh 16.04 install fix it, I really don't know what was wrong with my 18.04.
Thanks you very much, I can close this issue.
Hello,
I see a same issue closed but it doesn't help me knowing what's wrong.
After generated data and converted them, I was trying to train the model but it seems like nothing happened :
Except the segmentation fail, I don't have any error or result. I don't know what log file I coun't check either The script seems to stop in the file train.lua line 61:
Is someone know what could be the thing ?
Thanks in advance.