Problem with torch load pre-trained model

SHENG-KAI-HUANG commented 5 years ago

Hi, I was trying to using the pre-trained model which download from this repository. but I met the problem as following:

==> loading model from pretained weights from file: ./pre-trained/siam_hybridnet_fullsized.t7
Warning: Failed to load function from bytecode: binary string: not a precompiled chunkWarning: Failed to load function from bytecode: [string ""]:1: unexpected symbol near char(4)/home/mark/torch/install/bin/lua: /home/mark/torch/install/share/lua/5.2/torch/File.lua:375: unknown object stack traceback: [C]: in function 'error' /home/mark/torch/install/share/lua/5.2/torch/File.lua:375: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:307: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject' /home/mark/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read' /home/mark/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:369: in function 'readObject' /home/mark/torch/install/share/lua/5.2/nn/Module.lua:192: in function 'read' /home/mark/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject' ... ...k/torch/install/share/lua/5.2/cunn/DataParallelTable.lua:398: in function 'read' /home/mark/torch/install/share/lua/5.2/torch/File.lua:351: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:409: in function 'load' /usr/relativeCameraPose-master/gpu_util.lua:54: in function 'loadDataParallel' /usr/relativeCameraPose-master/model.lua:71: in main chunk [C]: in function 'dofile' /home/mark/torch/install/share/lua/5.2/paths/init.lua:84: in function 'dofile' main.lua:29: in main chunk [C]: in function 'dofile' ...mark/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?

Here is the pre-trained model's MD5 hash code: (created by md5sum command) bdf13b947817bd7d3244309b2cda811d ./pre-trained/siam_hybridnet_fullsized.t7

Is this file broken? or anything wrong? Could anyone give me a help?

SHENG-KAI-HUANG commented 5 years ago

By the way, I had tried load model in 'ascii' mode, but I got the another error message:

/home/mark/torch/install/bin/lua: /home/mark/torch/install/share/lua/5.2/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/mark/torch/pkg/torch/lib/TH/THDiskFile.c:352 stack traceback: [C]: in function 'readInt' /home/mark/torch/install/share/lua/5.2/torch/File.lua:259: in function 'readObject' /home/mark/torch/install/share/lua/5.2/torch/File.lua:409: in function 'load' test.lua:4: in main chunk [C]: in function 'dofile' ...mark/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: in ?

imelekhov commented 5 years ago

Hi there, Thank you for your interest in our work. The MD5 sum is correct. What version of CUDA and cudnn you have? I have installed torch and all the packages (nn, cunn, inn, cudnn) from scrath (with CUDA v9.2 and cudnn 5.1) and I could load the model at least.

SHENG-KAI-HUANG commented 5 years ago

@imelekhov thank you for your answer, I am using Cuda 8.0 and CUDNN 6.0.

I have tried to train the model and have created some snapshot, and I can load those .t7 which created by myself. According to torch7's website , it say the load function in binary format will be platform dependent, and ASCII format is platform-independent. So, maybe those different setting (or package version) between your environment and my environment cause this error happened. Therefore I think maybe ASCII format pre-trained model can help me to solve this error. Would you mind turning the pre-trained model into ASCII format?

imelekhov commented 5 years ago

I see. Sure, no problem. I have converted original weights to ascii format and put an archive here. MD5sum of the file inside is afcb6f1be9caf4a23d94b399fddfeb3d. Let me know if something goes wrong.

SHENG-KAI-HUANG commented 5 years ago

Well, still have some problem here. the error message to load the ascii model is:

Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string "2..."]:1: unexpected symbol near '2'luajit: /home/mark/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/mark/torch/pkg/torch/lib/TH/THDiskFile.c:352

I am using Ubuntu 16.04 with Lua 5.1 now, I don't sure the version of Lua will impact or not. but it looks some symbol (or string?) in ascii file can't be recognize by my computer. I will find some time to install CUDA 9.2 and CUDNN 5.1 then try it again, I will told you the result as soon as possible.

By the way, would you mind sharing the landmarks dataset which you used to training and validation in the paper? I have looked the original dataset, but I don't know how to use it as you describe in the paper.

AaltoVision / relativeCameraPose

Problem with torch load pre-trained model #4