Closed bamos closed 8 years ago
@ananghudaya - https://github.com/teradeep/demo-apps/issues/4 indicates this is an architecture issue, which is a problem with torch.load
I wasn't aware of, but is clearly in the documentation at https://github.com/torch/torch7/blob/master/doc/serialization.md. I saved the binary model in x86_64 and I think it's only compatible with x86_64. Are you using 32-bit x86 or ARM?
I've saved the model in ASCII format. Can you download and unxz
it from here.
$ md5sum nn4.v1.ascii.t7
735723e2c9cc4eefc00a7df34c9a4d3b nn4.v1.ascii.t7
Try loading it with:
$ th
th> require 'nn'
th> require 'dpnn'
th> net = torch.load('nn4.v1.ascii.t7', 'ascii')
If this works, I think you'll just need to replace nn4.v1.t7
with nn4.v1.ascii.t7
in the Python demos and make add ascii
to torch.load
in https://github.com/cmusatyalab/openface/blob/master/openface/openface_server.lua.
Thanks @bamos
Still no luck in getting it right. I've downloaded and verified the ASCII model. Here is the output:
th> net = torch.load('nn4.v1.ascii.t7', 'ascii')
cannot open <nn4.v1.ascii.t7> in mode r at /home/ananghudaya/torch/pkg/torch/lib/TH/THDiskFile.c:484
stack traceback:
[C]: at 0xb720afc0
[C]: in function 'DiskFile'
...e/ananghudaya/torch/install/share/lua/5.1/torch/File.lua:309: in function 'load'
[string "net = torch.load('nn4.v1.ascii.t7', 'ascii')"]:1: in main chunk
[C]: in function 'xpcall'
...e/ananghudaya/torch/install/share/lua/5.1/trepl/init.lua:648: in function 'repl'
...daya/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x0804d6d0
I'm using a 32-bit machine.
Hi Anang, this error looks like Torch can't find the file.
Did you unxz
it and check the md5sum?
-Brandon.
Thanks @bamos
Still no luck in getting it right. Here is the output:
th> net = torch.load('nn4.v1.ascii.t7', 'ascii') cannot open <nn4.v1.ascii.t7> in mode r at /home/ananghudaya/torch/pkg/torch/lib/TH/THDiskFile.c:484 stack traceback: [C]: at 0xb720afc0 [C]: in function 'DiskFile' ...e/ananghudaya/torch/install/share/lua/5.1/torch/File.lua:309: in function 'load' [string "net = torch.load('nn4.v1.ascii.t7', 'ascii')"]:1: in main chunk [C]: in function 'xpcall' ...e/ananghudaya/torch/install/share/lua/5.1/trepl/init.lua:648: in function 'repl' ...daya/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk [C]: at 0x0804d6d0
I'm using a 32-bit machine.
Reply to this email directly or view it on GitHub: https://github.com/cmusatyalab/openface/issues/42#issuecomment-152443941
Hi @bamos,
Yes I did. the md5 checksum is similar, and I have placed the file in the same folder as the other models.
Please double check the path to the model. The error message you're getting is the same error message I get for incorrect paths.
th> model = torch.load('/tmp/does-not-exist.t7')
cannot open </tmp/does-not-exist.t7> in mode r at /home/bamos/torch/pkg/torch/lib/TH/THDiskFile.c:484
stack traceback:
[C]: at 0x7f4389ef2a90
[C]: in function 'DiskFile'
/home/bamos/torch/install/share/lua/5.1/torch/File.lua:292: in function 'load'
[string "model = torch.load('/tmp/does-not-exist.t7')"]:1: in main chunk
[C]: in function 'xpcall'
/home/bamos/torch/install/share/lua/5.1/trepl/init.lua:648: in function 'repl'
...amos/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x00406670
The ascii model loads in about 30-45 seconds for me and the x86 binary model loads in a few seconds. I'll add a fallback mechanism when we transition to a Lua server in #4 instead of a Lua subprocess so only non 64-bit x86 users will have the 30 second penalty, and it will only be for the first time they start the server, not every time they try to run a new Python program using OpenFace.
i faced the same problem that torch cant load nn4.v1.ascii.t7. i downloaded nn4.v1.ascii.t7 and checked md5. as @bamos sayed it caused by incorrect path,but i tried absolutely path.it still showed that
cannot open
Hi @snowlord - strange! Can you (or @ananghudaya) try saving a small file in binary format, then loading it? Then doing the same with an ASCII-formatted file?
/tmp$ th
th> t = torch.Tensor(10)
th> torch.save('test-binary.t7', t)
th> t2 = torch.load('test-binary.t7')
th> torch.save('test-ascii.t7', t, 'ascii')
th> t3 = torch.load('test-ascii.t7', 'ascii')
th> t:eq(t2):all()
true
th> t:eq(t3):all()
true
If this works, can you then try doing it in a different directory that's not your current working directory?
hi,@bamos,i changed on the 64-bit x86,i have checked md5 of model file.it showed different problem.
th> torch.load('./models/openface/nn4.v1.t7')
/usr/local/share/lua/5.1/torch/File.lua:294: unknown object
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:240: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
[string "_RESULT={torch.load('./models/openface/nn4.v1..."]:1: in main chunk
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/trepl/init.lua:650: in function 'repl'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x00406260
Hi @snowlord - interesting you're seeing that on 64-bit x86. Somebody in this thread on the torch mailing list got a similar unknown object
error and said it was an architecture issue: https://groups.google.com/forum/#!msg/torch7/zNNdXATZxlA/z5A2HocVCgAJ
Does the ascii model work on your 64-bit x86 machine?
Hi @bamos , I got the same problem:
celeb-classifier.nn4.v1.pkl cifar10-test.t7 cifar10torchsmall.zip cifar10-train.t7 nn2.def.lua nn4.def.lua nn4.v1.ascii.t7 nn4.v1.t7
-bash-4.1# th
th> require 'nn'
{..........}
[0.0143s]
th> require 'dpnn'
true
[0.0113s]
th> net = torch.load('nn4.v1.t7')
/usr/local/share/lua/5.1/torch/File.lua:241: Failed to load function from bytecode: (binary): cannot load incompatible bytecode
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/torch/File.lua:241: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:278: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:278: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:325: in function 'load'
[string "net = torch.load('nn4.v1.t7')"]:1: in main chunk
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/trepl/init.lua:668: in function 'repl'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004051e0
[0.0005s]
th> net = torch.load('nn4.v1.ascii.t7', 'ascii')
/usr/local/share/lua/5.1/torch/File.lua:241: Failed to load function from bytecode: (binary): cannot load incompatible bytecode
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/torch/File.lua:241: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:278: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:278: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:325: in function 'load'
[string "net = torch.load('nn4.v1.ascii.t7', 'ascii')"]:1: in main chunk
[C]: in function 'xpcall'
/usr/local/share/lua/5.1/trepl/init.lua:668: in function 'repl'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
[C]: at 0x004051e0
[0.0053s]
th> net = torch.load('cifar10-train.t7')
[0.0134s]
th>
As you can see I had tried to load the model that you provided at this link: https://groups.google.com/forum/#!msg/torch7/zNNdXATZxlA/z5A2HocVCgAJ everything loads just fine. net = torch.load('cifar10-train.t7')
when I tried to load the nn4.v1.t7 with no susses: net = torch.load('nn4.v1.t7') net = torch.load('nn4.v1.ascii.t7', 'ascii')
I had done a md5sum test:
md5sum models/{dlib/*.dat,openface/*.{pkl,t7}}
73fde5e05226548677a050913eed4e04 models/dlib/shape_predictor_68_face_landmarks.dat
c0675d57dc976df601b085f4af67ecb9 models/openface/celeb-classifier.nn4.v1.pkl
735723e2c9cc4eefc00a7df34c9a4d3b models/openface/nn4.v1.ascii.t7
a59a5ec1938370cd401b257619848960 models/openface/nn4.v1.t7
I'm on x86_64 GNU/Linux. What seems to be the problem?
Ilya
It seems to be a problem of the lua and luajit versions:
-bash-4.1# lua -v Lua 5.1.4 Copyright (C) 1994-2008 Lua.org, PUC-Rio -bash-4.1# luajit -v LuaJIT 2.0.4 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/
I use these versions. Which version the model was complied with?
Ilya
I use these versions. Which version the model was complied with?
lua: Not installed on my system luajit: 2.1.0-alpha (from Torch)
I had installed LuaJIT 2.1.0-beta1. now the command got no errors!!! net = torch.load('nn4.v1.t7')
download from: https://github.com/torch/luajit-rocks
make sure to add this option to the cmake "-DWITH_LUAJIT21=ON" !!!!!!!!!!!!!
git clone https://github.com/torch/luajit-rocks.git cd luajit-rocks mkdir build cd build cmake .. -DWITH_LUAJIT21=ON
Hi @bamos , I try to play with ARM 32 bit platform, and change the torch load model to net = torch.load('nn4.v1.ascii.t7', 'ascii') A strange thing is when I run the compare demo script I got following error message:
Error getting result from Torch subprocess. Line read:
Exception:
could not convert string to float:
stdout:
stderr:
I tried to run the same code in X86_64 platform it's all OK since ascii version should be platform independent. Could you give some hint about this issue I had on ARM 32 bit platform? Thanks.
Hi @lijian8,
stdout:
stderr:
Are these both empty? I would expect more content.
I tried to run the same code in X86_64 platform it's all OK since ascii version should be platform indepedant. Could you give some hint about this issue I had on ARM 32 bit platform? Thanks.
I don't have any experience executing on 32-bit ARM. Maybe the Torch community will be able to help if we can find a more informative error message.
-Brandon.
Hi @bamos, Yes these are empty. I'll try to run sunprocess directly on torch to see if I can catch up something.
I had a very similar issue on Jetson TK1 board, here is a solution from another project that might help:
git clone https://github.com/mvitez/torch7.git mvittorch7
cd mvittorch7
luarocks make rocks/torch-scm-1.rockspec
diff --git a/eval.lua b/eval.lua
index 1814180..8cad5ba 100644
--- a/eval.lua
+++ b/eval.lua
@@ -65,8 +65,21 @@ end
-------------------------------------------------------------------------------
-- Load the model checkpoint to evaluate
-------------------------------------------------------------------------------
+local function load(filename)
+ local mode = 'binary'
+ local referenced = true
+ local file = torch.DiskFile(filename, 'r')
+ file[mode](file)
+ file:referenced(referenced)
+ file:longSize(8)
+ file:littleEndianEncoding()
+ local object = file:readObject()
+ file:close()
+ return object
+end
+
assert(string.len(opt.model) > 0, 'must provide a model')
-local checkpoint = torch.load(opt.model)
+local checkpoint = load(opt.model)
-- override and collect parameters
if string.len(opt.input_h5) == 0 then opt.input_h5 = checkpoint.opt.input_h5 end
if string.len(opt.input_json) == 0 then opt.input_json = checkpoint.opt.input_json end
I had the same issue. It was fixed by the comment from SyRenity:
git clone https://github.com/mvitez/torch7.git mvittorch7 cd mvittorch7 luarocks make rocks/torch-scm-1.rockspec
@jacklanchantin glad it helped :)
For information, https://github.com/torch/torch7/pull/476 was merged into master some time ago, so all the changes in @mvitez branch were integrated to torch.
Thanks @bamos @SyRenity @jacklanchantin, this issue should be fixed with instruction from @SyRenity .
Great info, thanks all!
@SyRenity i am also working on TK1 but still get error when I load the binary model for openface. As mentioned before this issue should be fixed. Do you have any clue about my errors. Thanks:
net = torch.load('/home/ubuntu/Downloads/face/openface/models/openface/nn4.small2.v1.t7') /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:370: table index is nil stack traceback: /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:370: in function 'readObject' /home/ubuntu/torch/install/share/lua/5.1/nn/Module.lua:158: in function 'read' /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject' /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "net = torch.load('/home/ubuntu/Downloads/face..."]:1: in main chunk [C]: in function 'xpcall' /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:669: in function 'repl' ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk [C]: at 0x0000cff9
Hi @ChrisYang - can you try using our ascii model from http://openface-models.storage.cmusatyalab.org/nn4.small2.v1.ascii.t7.xz? Unxz it and then use ascii
mode in torch.load
.
@bamos thanks for your prompt reply. Though I haven't found your ascii file, I managed to save a ascii version on a x86 machine and now I can load it from TK1. However I face some new issues. It runs ok using cpu mode but very slowly on TK1. When I tried to call net:forward in cuda mode i got cuda runtime error 'too many resources requested for launch at xxx'. Do you have any clue how to solve this?
@shimen Hi, I have the same problem as you, "File.lua failed to load function from bytecode binary string: not a precompiled chunk", and I also updated my luajit version to be 2.1 beta, but it still failed, I don't what to do now? Could anyone help? Thanks.
@apeterswu Hi, I'm not sure what is the problem. Since openFace version 0.2 I do not have to use this command.
Subject: openface
root@tegra-ubuntu:~/openface/openface# ./demos/compare.py images/examples/{lennon,clapton}
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:370: table index is nil
stack traceback:
/home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:370: in function 'readObject'
/home/ubuntu/torch/install/share/lua/5.1/nn/Module.lua:158: in function 'read'
/home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/home/ubuntu/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
...lib/python2.7/dist-packages/openface/openface_server.lua:46: in main chunk
[C]: in function 'dofile'
...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000cff9
Traceback (most recent call last):
File "./demos/compare.py", line 101, in
OpenFace: openface_server.lua
subprocess has died.
th
on your PATH? Check with which th
.th
is on your PATH, try running ./util/profile-network.lua
to see if Torch can correctly load and run the network.
stdout:
is anyone encountered such a problem?
my email is 329410527@qq.com
Thank you very much.
Some users following this issue may also be interested in helping improve dlib and its face detector's speed on ARM by adding NEON instructions. Contact @davisking if interested. Here is his comment from another thread:
NEON instructions are similar enough in overall structure that you should be able to implement alternative versions of the simd classes in dlib (e.g. https://github.com/davisking/dlib/blob/master/dlib/simd/simd8f.h). All the simd usage is through these classes, so if there were NEON versions of them then things would be much faster on ARM. I've had this on my todo list for a long time but haven't gotten around to it yet. You should give it a go :)
Hey @bamos,
I've been trying to use the Docker container in Ubuntu 14.04 on 64 bit x86 architecture. I have switched to the ascii model and I'm getting the same error as weiqifa0 above. I'm not quite sure where to go from here other than performing a fresh by hand install of Openface, which I want to avoid. Any suggestions would be great!
Exception in thread frame_process_thread_0: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(_self.args, *_self.__kwargs) File "/host/system/SurveillanceSystem.py", line 534, in process_frame predictions, alignedFace = self.recogniser.make_prediction(personimg,face_bb) File "/host/system/FaceRecogniser.py", line 111, in make_prediction persondict = self.recognize_face(alignedFace) File "/host/system/FaceRecogniser.py", line 121, in recognize_face if self.getRep(img) is None: File "/host/system/FaceRecogniser.py", line 145, in getRep rep = self.net.forward(alignedFace) # Gets embedding - 128 measurements File "/usr/local/lib/python2.7/dist-packages/openface/torch_neural_net.py", line 156, in forward rep = self.forwardPath(t) File "/usr/local/lib/python2.7/dist-packages/openface/torch_neural_net.py", line 113, in forwardPath """.format(self.cmd, self.p.stdout.read())) Exception:
OpenFace: openface_server.lua
subprocess has died.
th
on your PATH? Check with which th
.th
is on your PATH, try running ./util/profile-network.lua
to see if Torch can correctly load and run the network.
Diagnostic information:
cmd: ['/usr/bin/env', 'th', '/usr/local/lib/python2.7/dist-packages/openface/openface_server.lua', '-model', '/host/system/../models/openface/nn4.small2.v1.ascii.t7', '-imgDim', '96']
Don't worry just tested with Docker in a Ubuntu VM and worked perfectly :) not sure what the issue was.
hey @bamos, as you have said to save the model in ascii format i have saved it and i have tried these commands they are perfectly working
$ th th> require 'nn' th> require 'dpnn' th> net = torch.load('nn4.v1.ascii.t7', 'ascii')
but again when i try this `command ./demos/classifier.py infer ./generated-embeddings/classifier.pkl your_test_image.jpg
this is the error i am getting
/home/pi/torch/install/share/lua/5.1/torch/File.lua:375: unknown object stack traceback: [C]: in function 'error' /home/pi/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject' /home/pi/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
./batch-represent/main.lua:33: in main chunk
[C]: in function 'dofile'
...e/pi/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00014fa8
Had the same issue on ubuntu 16.04 | torch7. The ascii loading method worked with the provided ascii model download link. Just had to modify the ./batch-represent/opt.lua and main.lua file that the model gets loaded from in the example on the openface website for testing classification. However trying to run the ./demo/compare.py example that uses the openface python api suffers the same error. It seems if the torch_neural_net.py file's cmd could accept an ascii option it might be a way to curtail it?
self.cmd = ['/usr/bin/env', 'th', os.path.join(myDir, 'openface_server.lua'), '-model', model, '-imgDim', str(imgDim)]
-- update I also modified the torch_neural_net.py and openface_server.lua to include the ascii argument and it indeed works as well.
is it CNN
real time and web based...
had some issue
./demos/compare.py images/examples/{lennon,clapton} <openface.torch_neural_net.TorchNeuralNet instance at 0x7fbfe5a0c320> /home/sct/torch/install/bin/lua: .../sct/torch/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'treplutils' from file '/home/sct/torch/install/lib/lua/5.1/treplutils.so': dynamic libraries not enabled; check your Lua installation stack traceback: C: in function 'a_loader' .../sct/torch/install/share/lua/5.1/luarocks/loader.lua:117: in function <.../sct/torch/install/share/lua/5.1/luarocks/loader.lua:114> (tail call): ? C: in function 'require' /home/sct/torch/install/share/lua/5.1/trepl/init.lua:40: in main chunk C: in function 'require' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:104: in main chunk
Traceback (most recent call last):
File "./demos/compare.py", line 102, in
OpenFace: openface_server.lua
subprocess has died.
Is the Torch command th
on your PATH? Check with which th
.
If th
is on your PATH, try running ./util/profile-network.lua
to see if Torch can correctly load and run the network.
If this gives illegal instruction errors, see the section on this in our FAQ at http://cmusatyalab.github.io/openface/faq/
In Docker, use a Bash login shell or source /root/torch/install/bin/torch-activate for the Torch environment.
See this GitHub issue if you are running on a non-64-bit machine: https://github.com/cmusatyalab/openface/issues/42
Advanced Users: If you think this problem is caused by
running Lua as a subprocess, Vitalius Parubochyi has created
a version of this that uses https://github.com/imodpasteur/lutorpy.
This file is available at
Please post further issues to our mailing list at https://groups.google.com/forum/#!forum/cmu-openface
Diagnostic information:
cmd: ['/usr/bin/env', 'th', '/home/sct/miniconda3/envs/openface01/lib/python2.7/site-packages/openface/openface_server.lua', '-model', '/home/sct/CV/openface/demos/../models/openface/nn4.small2.v1.t7', '-imgDim', '96']
============
stdout:
From @ananghudaya in #26: