Train with coco dataset, KeyError: 'coco'

geekvc commented 7 years ago

I trained and tested with the default alexnet net and voc2007 dataset, everything goes well. I changed the options.lua with coco dataset, and netType with vgg19, some errors occured, maybe the dbcollection package caused the error.

$ th train.lua
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
Processing COCO train RoI proposals...
 [======================================== 82783/82783 ================================>]  Tot: 14m37s | Step: 10ms
Save COCO train RoI proposals to cache: /home/wangty/geekvc/fastrcnn-example-torch/data/cache/coco_proposals_train.t7
Processing COCO val RoI proposals...
 [======================================== 40504/40504 ================================>]  Tot: 7m12s | Step: 10ms
Save COCO val RoI proposals to cache: /home/wangty/geekvc/fastrcnn-example-torch/data/cache/coco_proposals_val.t7
==> (4/5) Setup model:
==> (5/5) Train Fast-RCNN model

==> Download coco data to disk...
Traceback (most recent call last):
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 32, in fetch_dataset_constructor
    return datasets[name]
KeyError: 'coco'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/manager.py", line 69, in download
    keywords = dataset.download(name, data_dir_, cache_save_path, extract_data, verbose)
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 124, in download
    dataset_loader = setup_dataset_constructor(name, data_dir, cache_dir, extract_data, verbose)
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 69, in setup_dataset_constructor
    constructor = fetch_dataset_constructor(name)
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 34, in fetch_dataset_constructor
    raise KeyError('Undefined dataset name: {}'.format(name))
KeyError: 'Undefined dataset name: coco'
/home/wangty/torch/install/bin/luajit: ...gty/torch/install/share/lua/5.1/dbcollection/manager.lua:60: attempt to index a nil value
stack traceback:
        ...gty/torch/install/share/lua/5.1/dbcollection/manager.lua:60: in function 'exists_task'
        ...gty/torch/install/share/lua/5.1/dbcollection/manager.lua:147: in function 'load'
        /mnt/geekvc/fastrcnn-example-torch/data.lua:18: in function 'get_db_loader'
        /mnt/geekvc/fastrcnn-example-torch/data.lua:164: in function 'fetch_loader_dataset'
        /mnt/geekvc/fastrcnn-example-torch/data.lua:309: in function 'data_gen'
        /home/wangty/torch/install/share/lua/5.1/fastrcnn/train.lua:47: in function 'train'
        train.lua:66: in main chunk
        [C]: in function 'dofile'
        ...ngty/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406620

I used this script in scripts/train_test_vgg16_coco.lua, and similar error occured.

$ th scripts/train_test_vgg16_coco.lua
Input options: -frcnn_hflip 0.5 -snapshot 10 -frcnn_rois_per_img 128 -nThreads 4 -optMethod sgd -netType vgg16 -trainIters 5000 -nGPU 1 -frcnn_test_max_size 1000 -frcnn_test_nms_thresh 0.3 -frcnn_test_scales 600 -frcnn_scales 600 -frcnn_roi_augment_offset 0.3 -frcnn_bg_thresh_lo 0.1 -frcnn_test_mode coco -dataset coco -frcnn_max_size 1000 -schedule {{40,1e-3,5e-4},{10,1e-4,5e-4}} -frcnn_imgs_per_batch 2 -frcnn_bg_thresh_hi 0.5 -expID frcnn_vgg16_coco -clear_buffers true -frcnn_fg_fraction 0.25 -frcnn_fg_thresh 0.5 -frcnn_bg_fraction 1 -testInter false
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Setup model:
==> (5/5) Train Fast-RCNN model

==> Download coco data to disk...
Traceback (most recent call last):
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 32, in fetch_dataset_constructor
    return datasets[name]
KeyError: 'coco'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/manager.py", line 69, in download
    keywords = dataset.download(name, data_dir_, cache_save_path, extract_data, verbose)
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 124, in download
    dataset_loader = setup_dataset_constructor(name, data_dir, cache_dir, extract_data, verbose)
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 69, in setup_dataset_constructor
    constructor = fetch_dataset_constructor(name)
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/dbcollection/datasets/funs.py", line 34, in fetch_dataset_constructor
    raise KeyError('Undefined dataset name: {}'.format(name))
KeyError: 'Undefined dataset name: coco'
/home/wangty/torch/install/bin/luajit: ...gty/torch/install/share/lua/5.1/dbcollection/manager.lua:60: attempt to index a nil value
stack traceback:
        ...gty/torch/install/share/lua/5.1/dbcollection/manager.lua:60: in function 'exists_task'
        ...gty/torch/install/share/lua/5.1/dbcollection/manager.lua:147: in function 'load'
        /mnt/geekvc/fastrcnn-example-torch/data.lua:18: in function 'get_db_loader'
        /mnt/geekvc/fastrcnn-example-torch/data.lua:164: in function 'fetch_loader_dataset'
        /mnt/geekvc/fastrcnn-example-torch/data.lua:309: in function 'data_gen'
        /home/wangty/torch/install/share/lua/5.1/fastrcnn/train.lua:47: in function 'train'
        train.lua:66: in main chunk
        [C]: in function 'dofile'
        ...ngty/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406620
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Load model: /home/wangty/geekvc/fastrcnn-example-torch/data/exp/coco/vgg16_coco/model_final.t7
/home/wangty/torch/install/bin/luajit: cannot open </home/wangty/geekvc/fastrcnn-example-torch/data/exp/coco/vgg16_coco/model_final.t7> in mode r  at /home/wangty/torch/pkg/torch/lib/TH/THDiskFile.c:670
stack traceback:
        [C]: at 0x7fe6fb4ad330
        [C]: in function 'DiskFile'
        /home/wangty/torch/install/share/lua/5.1/torch/File.lua:405: in function 'load'
        test.lua:49: in main chunk
        [C]: in function 'dofile'
        ...ngty/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406620

thank you in advance.

farrajota commented 7 years ago

Yeah, its an issue in the dbcollection package. The downloading procedure seems to be breaking, so let me take a look at it.

geekvc commented 7 years ago

Ok , thank you

farrajota commented 7 years ago

So, after close inspection I think the problem is the version of dbcollection. If you have not installed the dbcollection from source, you might have an outdated version. With that in mind I've pushed a new version 0.1.4 to PyPi, so you can update the package via pip install dbcollection.

Also, I've pushed a newer version to GitHub, and when travis/appveyor tests finish I'll report here when this version was uploaded to PyPi. Also, I'll upload it to conda if you have installed it from there.

Meanwhile, you can use pip to install the 0.1.4 version that should work fine or install it from source.

edit: version 0.1.5 should have fixed the issue when using python 2.7.

geekvc commented 7 years ago

Thank you for your prompt reply. I pulled the dbcollection repo and installed it, the download goes well.

$ git pull
remote: Counting objects: 22, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 22 (delta 13), reused 20 (delta 11), pack-reused 0
Unpacking objects: 100% (22/22), done.
From https://github.com/farrajota/dbcollection
   6ee029c..54229c3  master     -> origin/master
 * [new tag]         0.1.5      -> 0.1.5
Updating 6ee029c..54229c3
Fast-forward
 dbcollection/_version.py         | 2 +-
 dbcollection/datasets/dbclass.py | 2 +-
 dbcollection/manager.py          | 6 +++++-
 dbcollection/utils/file_load.py  | 5 ++++-
 4 files changed, 11 insertions(+), 4 deletions(-)

after run python setup.py install it is downloading the coco dataset

$ th train.lua
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Setup model:
==> (5/5) Train Fast-RCNN model
==> Download coco data to disk...
Download url (1/9): http://msvocds.blob.core.windows.net/coco2014/train2014.zip
  2% (12 of 596.6871500015259) |#                                                                                      | Elapsed Time: 0:00:17 ETA: 0:10:42

I tested the python2.7 environment train the Alexnet, passed 👍

geekvc commented 7 years ago

==> Download coco data to disk...
Download url (1/9): http://msvocds.blob.core.windows.net/coco2014/train2014.zip
  2% (12 of 596.6871500015259) |#                                                                                      | Elapsed Time: 0:00:17 ETA: 0:10:42

the download only about 596M, however the true size of train2014.zip is about 13G, I used the download tool see the true size. so, after download it, error occured, because the incomplete download:

Download url (1/9): http://msvocds.blob.core.windows.net/coco2014/train2014.zip
 99% (596 of 596.6871500015259) |##################################################################################### | Elapsed Time: 0:08:46 ETA: 0:00:00patool: Extracting /home/wangty/dbcollection/coco/data/train2014.zip ...
Traceback (most recent call last):
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/patool-1.12-py3.5.egg/patoolib/programs/py_zipfile.py", line 42, in extract_zip
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/zipfile.py", line 1026, in __init__
    self._RealGetContents()
  File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

I have already get the train2014, and I want to make a soft link to the dbcollection/coco/data directiory, however it seems the dataloader not seen the soft link, and continue download the train2014.zip file. I have used the unzip train2014 file to make a train.zip file, and the same as the val2014, test2014 and test2015, then move it to the dbcollection/coco/data directory. I think if the directory exist an unzip file, it is better do not check the exist of the *zip file. By this we can use the soft link to save disk space.

farrajota commented 7 years ago

You can try to remove the bad file and try running again the code to see if it works. Or, the best thing to do is to manually download all files of the coco dataset into a folder and link it to ~/dbcollection/coco/data.

Another thing you can do is to download the files into a folder and launch a python/lua terminal and do the following:

For python

import dbcollection as dbc
coco = dbc.load('coco', data_dir='path/to/folder/coco/')

Lua/Torch7

dbc = require 'dbcollection'
coco = dbc.load{name='coco', data_dir='path/to/folder/coco/'}

Then, when you launch the script to train on coco, it should automatically find where the data is stored on disk and proceed to extract/process the detection task metadata. This is a key feature of the dbcollection package where you only need to set up once the dataset + data files (if the data folder stored in disk is provided by the user) and future uses you won't need to tell where the data is located. All this data is stored in the ~/dbcollection.json file in your home dir.

Also, if you need to reset the data files for the coco dataset you can do it via the dbcollection API:

Python

import dbcollection as dbc
dnc.remove(name='coco', delete_data=True)

Lua/Torch7

dbc = require 'dbcollection'
dbc.remove{name='coco', delete_data=true}

PS: Nevertheless, I'll fix this issue when downloading files via the API such that the user doesn't get impacted by an error and has to manually fix the problem.

geekvc commented 7 years ago

Wow, all the circumstances have been taken into consideration already. I am not understanding the essence of the dbcollection. I will read the dbcollection API carefully and I'll learn more about it. 👍 Thank you for your thoughtful reply.

farrajota commented 7 years ago

I'm really happy that someone else besides me is testing the API. But I must give you a warning: this code is still is its "alpha" stages (if you can call that) regarding documentation and it misses unit testing for the core Python API. But it is usable and it has some kind of basic documentation that it is already provided in the DOCUMENTATION.md files for each language currently available (i.e., Python, Lua/Torch7, Matlab).

I'm continuing to improve the documentation/tests/API usage and linking my other projects with it, so there may be some bugs here and there (hopefully just a few more).

As always, fell free to ask for help if anything breaks and I'll gladly assist you.

geekvc commented 7 years ago

Thank you for you excellent work and prompt reply for my every question. I will follow up the API and test the new feature until its release stage.

farrajota / fastrcnn-example-torch