abhshkdz / neural-vqa

:grey_question: Visual Question Answering in Torch
https://arxiv.org/abs/1505.02074
491 stars 90 forks source link

Error while evaluating through pretrained checkpoint #14

Open aekanshkansal1 opened 7 years ago

aekanshkansal1 commented 7 years ago

I am trying to get the results through pretrained cpu checkpoint. My command is

th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7 -input_image_path data/train2014/COCO_train2014_000000405541.jpg -question 'What is the cat on?' -gpuid -1

Error given is Loading data files...
/home/aekansh/torch/install/bin/lua: ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/aekansh/torch/pkg/torch/lib/TH/THDiskFile.c:349 stack traceback: C: in function 'readInt' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' ./utils/DataLoader.lua:47: in function 'create' predict.lua:59: in main chunk C: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

Even if I donot use -gpuid parameter it gives the same error

abhshkdz commented 7 years ago

Do you have data.t7, answers_vocab.t7, and questions_vocab.t7 in the data/ folder and the model checkpoint in the checkpoints/ folder? (Download links given here).

aekanshkansal1 commented 7 years ago

Yes I have all of these in correct folders

abhshkdz commented 7 years ago

That's odd. It seems like a path issue. Line 47 is local data = torch.load(tensor_file). If you th and torch.load('data/data.t7'), do you get the same error?

aekanshkansal1 commented 7 years ago

If I th and torch.load('data/data.t7') I get error as

...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/aekansh/torch/pkg/torch/lib/TH/THDiskFile.c:349 stack traceback: ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:506: in function <...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:499> C: in function 'readInt' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "_RESULT={torch.load('data/data.t7')}"]:1: in main chunk C: in function 'xpcall' ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk

abhshkdz commented 7 years ago

Thanks. Same error. How about torch.load('data/data.t7', 'binary')?

aekanshkansal1 commented 7 years ago

Same error still

aekanshkansal1 commented 7 years ago

..e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: read error: read 0 blocks instead of 1 at /home/aekansh/torch/pkg/torch/lib/TH/THDiskFile.c:349 stack traceback: ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:506: in function <...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:499> C: in function 'readInt' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:259: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "_RESULT={torch.load('data/data.t7','binary'..."]:1: in main chunk C: in function 'xpcall' ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk

abhshkdz commented 7 years ago

Thanks. Seems like an architecture issue. Could you try downloading data.ascii.t7 from here. And then try torch.load('data/data.ascii.t7', 'ascii').

aekanshkansal1 commented 7 years ago

After using load('data/data.ascii.t7', 'ascii')

Output given is

...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:375: unknown object stack traceback: ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:506: in function <...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:499> C: in function 'error' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject' ...e/aekansh/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load' [string "_RESULT={torch.load('data/data.ascii.t7')}"]:1: in main chunk C: in function 'xpcall' ...e/aekansh/torch/install/share/lua/5.1/trepl/init.lua:661: in function 'repl' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk

abhshkdz commented 7 years ago

Strange! The documentation suggests ascii should be readable. Not sure what the issue is. You might want to try asking on the Torch group: https://groups.google.com/forum/#!forum/torch7.

aekanshkansal1 commented 7 years ago

OK thanks

yadavankit commented 7 years ago

@abhshkdz I am not getting this error on th then torch.load('data/data.t7'). Instead, am getting this type of output (am displaying just one)-

121512 :
        {
          answer : 997
          image_id : 552610
          question : ShortTensor - size: 23
        }

I guess this works fine. But, on using the predict.lua as - th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7 -input_image_path data/train2014/COCO_train2014_000000405543.jpg -question 'What is in the plate' am getting the following stacktrace -

Loading data files...
loading checkpoint from checkpoints/vqa_epoch23.26_0.4610_cpu.t7
Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string "..."]:1: unexpected symbol near 'char(8)'/Users/WARL0CK/torch/install/bin/luajit: /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:375: unknown object
stack traceback:
    [C]: in function 'error'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:375: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:307: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    ...
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    .../WARL0CK/torch/install/share/lua/5.1/nngraph/gmodule.lua:495: in function 'read'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
    predict.lua:64: in main chunk
    [C]: in function 'dofile'
    ...L0CK/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
    [C]: at 0x01045bb350

What @aekanshkansal1 is facing is ARM architecture issue I guess, but ascii should work nicely.

abhshkdz commented 7 years ago

Hey @yadavankit, data.t7 looks fine. The error looks like you're using luajit, the checkpoint was created using lua5.1 (ref).

yadavankit commented 7 years ago

@abhshkdz thanks 👍 will clean and install 5.1 right away

yadavankit commented 7 years ago

Now, am getting error on evaluating, am already having the latest protobuf 3.2.0 installed -

[libprotobuf` INFO google/protobuf/io/coded_stream.cc:610] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
lua(31231,0x7fffcec693c0) malloc: *** error for object 0x7fbb84e274e0: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
[1]    31231 abort      th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7
abhshkdz commented 7 years ago

Memory issues?

yadavankit commented 7 years ago

Should I try increasing the limit as below? CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h What would you recommend doing?

abhshkdz commented 7 years ago

Don't think it's that. Read bytes seem to be lower than the limit. Does this error show up in every run? Could you monitor system memory and see if things aren't going over? You could also comment everything after the loadcaffe.load(...) line and gradually uncomment things to pinpoint what's causing the issue.

yadavankit commented 7 years ago

No, every now and then this error occurs too -

Loading data files...
loading checkpoint from checkpoints/vqa_epoch23.26_0.4610_cpu.t7
[libprotobuf INFO google/protobuf/io/coded_stream.cc:610] Reading dangerously large protocol message.  If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
[1]    33360 bus error  th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7
yadavankit commented 7 years ago

I don't think system memory will be the problem here, screen shot 2017-03-30 at 6 42 17 pm

yadavankit commented 7 years ago

Am now getting this too -

lua(33505,0x7fffcec693c0) malloc: *** error for object 0x7faf1dee6210: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
[1]    33505 abort      th predict.lua -checkpoint_file checkpoints/vqa_epoch23.26_0.4610_cpu.t7
zhimeng9 commented 7 years ago

hi @yadavankit ,I get the same error, how you fixed it? thank you

"Loading data files... loading checkpoint from checkpoints/vqa_epoch23.26_0.4610_cpu.t7 Warning: Failed to load function from bytecode: (binary): cannot load incompatible bytecodeWarning: Failed to load function from bytecode: [string "..."]:1: unexpected symbol near 'char(8)'/Users/WARL0CK/torch/install/bin/luajit: /Users/WARL0CK/torch/install/share/lua/5.1/torch/File.lua:375: unknown object stack traceback:"

yadavankit commented 7 years ago

@zhimeng9 which version of Lua are you using?