Test accuracy with coco dataset

geekvc commented 7 years ago

Hi, farrajota I trained and tested the fastrcnn with voc dataset, everything goes well. I trained the fastrcnn in coco dataset with no error, when I tested the accuracy with the trained model, error occured:

$ th test.lua
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Load model: /home/wangty/geekvc/fastrcnn-example-torch/data/exp/coco/frcnn_vgg16_coco/model_final.t7
==> (5/5) Test Fast-RCNN model
666666
444444
cococo
111111
/home/wangty/torch/install/bin/luajit: invalid arguments: IntTensor FloatTensor IntTensor
expected arguments: [*IntTensor*] IntTensor [int] IntTensor IntTensor
stack traceback:
        [C]: at 0x7f8ea94f0e10
        [C]: in function 'addcmul'
        ...angty/torch/install/share/lua/5.1/fastrcnn/utils/box.lua:141: in function 'convertFrom'
        ...y/torch/install/share/lua/5.1/fastrcnn/ImageDetector.lua:85: in function 'detect'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:130: in function 'testOne'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:212: in function 'test'
        /home/wangty/torch/install/share/lua/5.1/fastrcnn/test.lua:34: in function 'test'
        test.lua:73: in main chunk
        [C]: in function 'dofile'
        ...ngty/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406620

-netType vgg16.

farrajota commented 7 years ago

Hi,

I've pushed a fix for this issue https://github.com/farrajota/fast-rcnn-torch/commit/0a286bce56f36d0200147f503b9108df42bd55dc. It was due to tensor type mismatch in the fastrcnn package. It should work properly now. Do a git pull and luarocks make rocks/* on the fastrcnn repo to get the fix.

geekvc commented 7 years ago

Thank you very much. After pulled the fastrcnn repo, it is quite work properly now! The test only given the final accuracy of each category and the mAP. I want to know whether it is possible to save the result of each image, such the coordinates of the bounding boxes, and the category information of the bouding box? By this we can selectively visualize some detection results.

farrajota commented 7 years ago

The demo.lua file does some interactive visualizations on random images but it doesn't store the results to disk. You can use the demo code and save the scores and boxes to a file. Basically you just have to do something like torch.save(filename, {scores, bboxes}) after this line.

geekvc commented 7 years ago

That's great, thank you very much!

geekvc commented 7 years ago

I retrained the vgg16 network with the coco dataset, and get the final model, and I tested the coco dataset detection accuracy with the coco test mode. When the test finished, It sames that not given the final result with the coco evaluation metric. Is it because I don't have enough memory?

test: 40500/40504 dev: 1, forward time: 0.857, select time: 0.758s, nms time: 0.758s, total time: 1.636s
test: 40501/40504 dev: 1, forward time: 0.498, select time: 0.677s, nms time: 0.677s, total time: 1.199s
test: 40502/40504 dev: 1, forward time: 0.477, select time: 0.638s, nms time: 0.638s, total time: 1.157s
test: 40503/40504 dev: 1, forward time: 0.801, select time: 0.660s, nms time: 0.660s, total time: 1.492s
test: 40504/40504 dev: 1, forward time: 0.877, select time: 1.365s, nms time: 1.365s, total time: 2.265s

*********************************************
***   COCO evaluation metric
*********************************************

Loading files to calculate sizes...
Total boxes: 4065662
Loading files to create giant tensor...
Converting data tensor to table format (to save as a .json file)...
/home/wangty/torch/install/bin/luajit: not enough memory===>.]  ETA: 252ms | Step: 0ms

Thank you very much.

farrajota commented 7 years ago

I'll take a look into it. It looks like luajit is hitting its memory limits (2gb) when creating the .json file. If so, then I'll create a workaround for it.

It takes some time to complete the testing script on the coco dataset, so when its done I'll take a look to see if this is really the issue and post a fix.

geekvc commented 7 years ago

Ok, it is quite a time consuming work to test all the 40504 images, and that's why I want to make a samll coco dataset, such as coco5K to reduce hours of test in the experiment. Thank you very much for your consideration.

farrajota commented 7 years ago

I've committed some fixes to solve this issue. This took me a while to fix because the coco dataset has a lot of images and I didn't have enough ram in my machine to store the processed data so I had to improvise. Having said this, now it should work without having issues with the luajit's memory limit. This was the problem when saving the results to a .json file.

Moreover, I've set a new flag named frcnn_test_use_cache which enables to store results on disk and thus reducing the usage of memory used when testing the dataset. By default it is set to use the ram memory (faster), but you can set it to 'true' and use the disk to cache the results along the way (slower).

To get this fixes you'll need to do the following:

get the latest commits by doing git pull for this repo;
Re-install the fastrcnn package by doing a git pull and luarocks make rocks/*;
Update the dbcollection install by either using pip install dbcollection or conda install -c farrajota dbcollection

geekvc commented 7 years ago

It is very kind of you! I will try it immediately. Thank you very much!

geekvc commented 7 years ago

I updated this repo and the fastrcnn package, after that re-install the fastrcnn and dbcollection package, when I ran the test.lua with the config as follows:

    opt.expID = 'frcnn_vgg16_coco'
    opt.dataset = 'coco'
    opt.GPU = 2
    opt.netType = 'vgg16'
    opt.frcnn_test_mode = 'coco'

and used the flag frcnn_test_use_cache is true, an error occured. I tested several times, It seems like there a type conversion error in fastrcnn package.

==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Load model: /home/wangty/geekvc/fastrcnn-example-torch/data/exp/coco/frcnn_vgg16_coco/model_final.t7
==> (5/5) Test Fast-RCNN model

Saving temporary files to: /mnt/geekvc/fastrcnn-example-torch/Tester_Eval
/home/wangty/torch/install/bin/luajit: ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:72: bad argument #2
 to 'getFilename' (string expected, got table)
stack traceback:
        [C]: in function 'getFilename'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:72: in function 'getImage'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:114: in function 'testOne'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:236: in function 'test_use_cache'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:278: in function 'test'
        /home/wangty/torch/install/share/lua/5.1/fastrcnn/test.lua:34: in function 'test'
        test.lua:73: in main chunk
        [C]: in function 'dofile'
        ...ngty/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406620

I tested with anaconda python 3 and python 2.7.

farrajota commented 7 years ago

I'm not able to reproduce this issue, although I've pushed a commit to fish a small issue regarding data fetching, so its best to do a git pull on this repo. Also, I don't recommend setting the GPU id to anything but GPU=1, because it has been an issue in torch (at least for me) for a long time, and it's best to select which gpus you want to use for running the script by setting the CUDA_VISIBLE_DEVICES flag like this:

# select the gpus 1 and 2
CUDA_VISIBLE_DEVICES=0,1 th test.lua

or you want to select different gpus from your cluster in a particular order then

# select the gpus 3 and 1
CUDA_VISIBLE_DEVICES=2, 0 th test.lua

This is the recommended way to select GPUs for your script.

geekvc commented 7 years ago

Oh, that's a good idea to use the CUDA_VISIBLE_DEVICES flag to select GPU device! When I test with the voc2007 dataset and voc model. This error still occurred. Is there something wrong with my fastrcnn package installation?

$ CUDA_VISIBLE_DEVICES=1 th test.lua
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Load model: /home/wangty/geekvc/fastrcnn-example-torch/data/exp/pascal_voc_2007/vgg16/model_final.t7
==> (5/5) Test Fast-RCNN model
/home/wangty/torch/install/bin/luajit: ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:72: bad argument #2 to 'getFilename' (string expected, got table)
stack traceback:
        [C]: in function 'getFilename'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:72: in function 'getImage'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:114: in function 'testOne'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:208: in function 'test_no_cache'
        ...e/wangty/torch/install/share/lua/5.1/fastrcnn/Tester.lua:280: in function 'test'
        /home/wangty/torch/install/share/lua/5.1/fastrcnn/test.lua:34: in function 'test'
        test.lua:58: in main chunk
        [C]: in function 'dofile'
        ...ngty/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406620

farrajota commented 7 years ago

I've built my fastrcnn and dbcollection Lua packages and I did not get any issues. I believe the issue in your case is the dbcollection installation for lua being an older version, and for the new code to work properly you need to install the newest version. Just to be sure do the following:

Uninstall the fastrcnn and dbcollection from luarocks;

luarocks remove fastrcnn
luarocks remove dbcollection

Clone the respective repos from GitHub (you should delete the old ones just to be safe):

git clone https://github.com/farrajota/fast-rcnn-torch
git clone https://github.com/dbcollection/dbcollection-torch7

Install the packages:

cd fast-rcnn-torch && luarocks make rocks/*

cd dbcollection-torch7 && luarocks make

Do a git pull in the fastrcnn-example-torch dir to have the latest changes to the code.

After these steps try to see if you still get the same error as before.

geekvc commented 7 years ago

Ok, I did as you told me, the voc and coco dataset testing goes well, that's great! Thank you very much!

farrajota / fastrcnn-example-torch

Test accuracy with coco dataset #5