flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

while trainning, error "attempt to call field 'WeightNorm' (a nil value)" happened #51

Closed bolt163 closed 6 years ago

bolt163 commented 6 years ago

while executing the training step, error "attempt to call field 'WeightNorm' (a nil value)" has any body meet the same error what's the problem? , there were no error while installing the previous cudnn,cunn and torch installing step.....

luajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc -datadir ~/librispeech-proc -train train-clean-100+train-clean-360+train-other-500 -valid dev-clean+dev-other -test test-clean+test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05

——————————————————————————————————————————————

luajit ~/wav2letter/train.lua --train -rundir ~/experiments -runname hello_librispeech -arch ~/wav2letter/arch/librispeech-glu-highdropout -lr 0.1 -lrcrit 0.0005 -gpu 1 -linseg 1 -linlr 0 -linlrcrit 0.005 -onorm target -nthread 6 -dictdir ~/librispeech-proc -datadir ~/librispeech-proc -train train-clean-100+train-clean-360+train-other-500 -valid dev-clean+dev-other -test test-clean+test-other -gpu 1 -sqnorm -mfsc -melfloor 1 -surround "|" -replabel 2 -progress -wnorm -normclamp 0.2 -momentum 0.9 -weightdecay 1e-05

| experiment path: /data1/experiments/hello_librispeech | experiment runidx: 1 | number of classes (network) = 30 [Network spec] C NCHANNEL 400 13 1 GLU DO 0.2 C 200 440 14 1 GLU DO 0.214 C 220 484 15 1 GLU DO 0.22898 C 242 532 16 1 GLU DO 0.2450086 C 266 584 17 1 GLU DO 0.262159202 C 292 642 18 1 GLU DO 0.28051034614 C 321 706 19 1 GLU DO 0.30014607037 C 353 776 20 1 GLU DO 0.321156295296 C 388 852 21 1 GLU DO 0.343637235966 C 426 936 22 1 GLU DO 0.367691842484 C 468 1028 23 1 GLU DO 0.393430271458 C 514 1130 24 1 GLU DO 0.42097039046 C 565 1242 25 1 GLU DO 0.450438317792 C 621 1366 26 1 GLU DO 0.481969000038 C 683 1502 27 1 GLU DO 0.51570683004 C 751 1652 28 1 GLU DO 0.551806308143 C 826 1816 29 1 GLU DO 0.590432749713 C 908 1816 1 1 GLU DO 0.590432749713 C 908 NLABEL 1 1

luajit: .../usr/share/lua/5.1/wav2letter/runtime/netutils.lua:25: attempt to call field 'WeightNorm' (a nil value) stack traceback: .../usr/share/lua/5.1/wav2letter/runtime/netutils.lua:25: in function 'TemporalConvolution' .../usr/share/lua/5.1/wav2letter/runtime/netutils.lua:209: in function 'create' /data1/wav2letter/train.lua:368: in main chunk [C]: at 0x00406020

bolt163 commented 6 years ago

the problem has been solved myself , for owners and distributors of this project, I think there should be friendly and more detail documents (e.g. FAQ) for good propagation of this project;
There are so much pit in the deployment of the model, almost in every step which will trap and fight enthusiasm of developers who are interested in the project

vineelpratap commented 6 years ago

Hi, We are working on making the whole system easier to experiment by having a docker file and also remove external dependencies as much as possible.

Also, can you comment here on how you have solved this issue so that it can help others. Thanks !

bolt163 commented 6 years ago

Thanks for vineelpratap's reply~ thanks very much

for luajit+luarocks installation step on linux platform, while doing accroding to the step in ReadMe.md —————————————————— LuaJIT + LuaRocks The following installs luaJIT and luarocks locally in $HOME/usr. If you want a system-wide installation, remove the -DCMAKE_INSTALL_PREFIX=$HOME/usr option.

git clone https://github.com/torch/luajit-rocks.git cd luajit-rocks mkdir build; cd build cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/usr -DWITH_LUAJIT21=OFF make -j 4 make install cd ../.. ” ————— after installing, go into the luajit enviroment, manually reqiure the nn Module, it's already a nil value!!! i already tried re-luarocks install the nn module, the problem still exists! _15183208864656

This means after you do all the deployment job, the all jobs are wasted .....for the luarocks/luajit deployment you have been already stepped on the pit, what an oddly pit!!!【but while i do the same step on the macosx system, it‘s ok , the nn Module's WeightNorm is a table value not nil value】

so if anybody meet the same problem, the solution is to re-install torch not reference the LuaJIT + LuaRocks install step in the Readme.md 2

【Note: my linux platform is Centos version】