NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.11k stars 1.38k forks source link

NetworkVisualizationError + ModelConstructionError (text classification) #1653

Open user3549 opened 7 years ago

user3549 commented 7 years ago

Hi, I'm testing the text classification plugin in DIGITS, following this example : https://github.com/NVIDIA/DIGITS/tree/master/examples/text-classification I have these errors (using the same algorithm and data):

NetworkVisualizationError:
-------------------------
u"\x1b[?1034h2017-05-25 13:40:50 [INFO ] creating data readers\t\nUsing CuDNN backend\t\n/home/xx/torch/install/bin/lua: /tmp/tmpY8WJTL.lua:38: attempt to call field 'OneHot' (a nil value)\nstack traceback:\n\t/tmp/tmpY8WJTL.lua:38: in function 'network_func'\n\t...e/xx/caffe-nv/digits/digits/tools/torch/main.lua:288: in main chunk\n\t[C]: in function 'dofile'\n\t.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk\n\t[C]: in ?\n"
Model Construction: 
-------------------------
ERROR: .../torch/install/share/lua/5.2/threads/threads.lua:183: [thread 2 callback] .../torch/install/share/lua/5.2/pb/standard/buffer.lua:40: attempt to call global 'module' (a nil value)
----------------------
 * Debugger is active!
----------------------
2017-05-25 13:25:30 [20170525-132529-8df7] [INFO ] Train Torch Model task started.
2017-05-25 13:25:30 [20170525-132529-8df7] [INFO ] Task subprocess args: "/home/xx/torch/install/bin/th /home/xx/caffe-nv/digits/digits/tools/torch/wrapper.lua main.lua --network=model --epoch=1 --networkDirectory=/home/xx/caffe-nv/digits/digits/jobs/20170525-132529-8df7 --save=/home/xx/caffe-nv/digits/digits/jobs/20170525-132529-8df7 --snapshotPrefix=snapshot --snapshotInterval=1.0 --learningRate=0.01 --policy=exp --dbbackend=lmdb --train=/home/xx/caffe-nv/digits/digits/jobs/20170525-124059-ae11/train_db/features --validation=/home/xx/caffe-nv/digits/digits/jobs/20170525-124059-ae11/val_db/features --gamma=0.98 --shuffle=yes --subtractMean=none --optimization=sgd --interval=0.25 --type=cuda --augFlip=none --augQuadRot=none --augHSVh=0 --augHSVs=0 --augHSVv=0"
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: 2017-05-25 13:25:31 [INFO ] creating data readers
2017-05-25 13:25:31 [20170525-132529-8df7] [ERROR] Train Torch Model: .../torch/install/share/lua/5.2/threads/threads.lua:183: [thread 2 callback] ...a/torch/install/share/lua/5.2/pb/standard/buffer.lua:40: attempt to call global 'module' (a nil value)
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: stack traceback:
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ...a/torch/install/share/lua/5.2/pb/standard/buffer.lua:40: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'require'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ...haa/torch/install/share/lua/5.2/pb/standard/pack.lua:34: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'require'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/pb/standard.lua:31: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'require'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: /home/xx/torch/install/share/lua/5.2/pb.lua:98: in function 'get_backend'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: /home/xx/torch/install/share/lua/5.2/pb.lua:107: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'require'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../xx/caffe-nv/digits/digits/tools/torch/utils.lua:216: in function 'check_require'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ...e/xx/caffe-nv/digits/digits/tools/torch/data.lua:377: in function 'new'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ...e/xx/caffe-nv/digits/digits/tools/torch/data.lua:729: in function <...e/xx/caffe-nv/digits/digits/tools/torch/data.lua:724>
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: (...tail calls...)
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'xpcall'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/threads.lua:234: in function 'callback'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/queue.lua:65: in function <.../torch/install/share/lua/5.2/threads/queue.lua:41>
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'pcall'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/queue.lua:40: in function 'dojob'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [string "  local Queue = require 'threads.queue'..."]:13: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: DIGITS Lua Error
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: stack traceback:
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'error'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/threads.lua:183: in function 'dojob'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/threads.lua:264: in function 'synchronize'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/threads.lua:142: in function 'specific'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/share/lua/5.2/threads/threads.lua:125: in function <.../torch/install/share/lua/5.2/threads/threads.lua:36>
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: (...tail calls...)
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ...e/xx/caffe-nv/digits/digits/tools/torch/data.lua:722: in function 'new'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ...e/xx/caffe-nv/digits/digits/tools/torch/main.lua:229: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'dofile'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../caffe-nv/digits/digits/tools/torch/wrapper.lua:25: in function <.../caffe-nv/digits/digits/tools/torch/wrapper.lua:25>
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'xpcall'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: ../caffe-nv/digits/digits/tools/torch/wrapper.lua:25: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in function 'dofile'
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
2017-05-25 13:25:31 [20170525-132529-8df7] [WARNING] Train Torch Model unrecognized output: [C]: in ?
2017-05-25 13:25:31 [20170525-132529-8df7] [ERROR] Train Torch Model task failed with error code 1
user3549 commented 7 years ago

Just to confirm that I have installed all the extra dependencies, including lua-pb https://github.com/NVIDIA/DIGITS/blob/master/docs/BuildTorch.md

aaranmcguire commented 7 years ago

I'm also having this issue, I've installed all the dependencies I've found in documentation. Heres a Dockerfile https://gist.github.com/aaranmcguire/35b1cb4e197dc2a0c3e8262c2329453c

aaranmcguire commented 7 years ago

@user3549 Just found out you need to re-install nn (luarocks install nn) - It seems the version installed by Torch does not have the nn.OneHot module.

user3549 commented 7 years ago

@aaranmcguire Thanks! I re-installed nn and dpnn and still have the same error:

user3549 commented 7 years ago

@lukeyeager Do you have any clues to solve this error? Thanks!

aaranmcguire commented 7 years ago

@user3549 Can you post the whole error?

user3549 commented 7 years ago

@aaranmcguire It's the same error that I mentioned above. The error occurs when I run the construction of the model, following this example : https://github.com/NVIDIA/DIGITS/tree/master/examples/text-classification. Here is the whole error again: Train Torch Model Error Initialized at 11:03:53 AM (1 second) Running at 11:03:54 AM (0 seconds) Error at 11:03:55 AM (Total - 2 seconds) ERROR: torch-new/install/share/lua/5.2/threads/threads.lua:183: [thread 3 callback] torch-new/install/share/lua/5.2/pb/standard/buffer.lua:40: attempt to call global 'module' (a nil value) Last output: torch-new/install/share/lua/5.2/threads/threads.lua:183: [thread 3 callback] torch-new/install/share/lua/5.2/pb/standard/buffer.lua:40: attempt to call global 'module' (a nil value)

aaranmcguire commented 7 years ago

@user3549 You may need to check the output from the console as I don't think this is the whole error message. Also, are you running this in docker?

user3549 commented 7 years ago

No, I'm not using docker. The output from the console is what I put in my first message, here is the error again:

user3549 commented 7 years ago

@aaranmcguire I guess there is no solution to fix the error?

augustoicaro commented 6 years ago

@user3549 Same problem here after follow all steps to build and configure torch. Do you solve this?

evertonaleixo commented 6 years ago

I am having the same problemen.

Someone find the solution?