Open geekvc opened 7 years ago
Yeah, its an issue in the dbcollection package. The downloading procedure seems to be breaking, so let me take a look at it.
Ok , thank you
So, after close inspection I think the problem is the version of dbcollection
. If you have not installed the dbcollection
from source, you might have an outdated version. With that in mind I've pushed a new version 0.1.4
to PyPi, so you can update the package via pip install dbcollection
.
Also, I've pushed a newer version to GitHub, and when travis/appveyor tests finish I'll report here when this version was uploaded to PyPi. Also, I'll upload it to conda if you have installed it from there.
Meanwhile, you can use pip
to install the 0.1.4
version that should work fine or install it from source.
edit: version 0.1.5
should have fixed the issue when using python 2.7.
Thank you for your prompt reply. I pulled the dbcollection repo and installed it, the download goes well.
$ git pull
remote: Counting objects: 22, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 22 (delta 13), reused 20 (delta 11), pack-reused 0
Unpacking objects: 100% (22/22), done.
From https://github.com/farrajota/dbcollection
6ee029c..54229c3 master -> origin/master
* [new tag] 0.1.5 -> 0.1.5
Updating 6ee029c..54229c3
Fast-forward
dbcollection/_version.py | 2 +-
dbcollection/datasets/dbclass.py | 2 +-
dbcollection/manager.py | 6 +++++-
dbcollection/utils/file_load.py | 5 ++++-
4 files changed, 11 insertions(+), 4 deletions(-)
after run python setup.py install
it is downloading the coco dataset
$ th train.lua
==> (1/5) Load options
==> (2/5) Load dataset data loader
==> (3/5) Load roi proposals data
==> (4/5) Setup model:
==> (5/5) Train Fast-RCNN model
==> Download coco data to disk...
Download url (1/9): http://msvocds.blob.core.windows.net/coco2014/train2014.zip
2% (12 of 596.6871500015259) |# | Elapsed Time: 0:00:17 ETA: 0:10:42
I tested the python2.7 environment train the Alexnet, passed 👍
==> Download coco data to disk...
Download url (1/9): http://msvocds.blob.core.windows.net/coco2014/train2014.zip
2% (12 of 596.6871500015259) |# | Elapsed Time: 0:00:17 ETA: 0:10:42
the download only about 596M, however the true size of train2014.zip is about 13G, I used the download tool see the true size. so, after download it, error occured, because the incomplete download:
Download url (1/9): http://msvocds.blob.core.windows.net/coco2014/train2014.zip
99% (596 of 596.6871500015259) |##################################################################################### | Elapsed Time: 0:08:46 ETA: 0:00:00patool: Extracting /home/wangty/dbcollection/coco/data/train2014.zip ...
Traceback (most recent call last):
File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/site-packages/patool-1.12-py3.5.egg/patoolib/programs/py_zipfile.py", line 42, in extract_zip
File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/zipfile.py", line 1026, in __init__
self._RealGetContents()
File "/home/wangty/.pyenv/versions/anaconda3-4.1.0/lib/python3.5/zipfile.py", line 1093, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
I have already get the train2014, and I want to make a soft link to the dbcollection/coco/data
directiory, however it seems the dataloader not seen the soft link, and continue download the train2014.zip file.
I have used the unzip train2014 file to make a train.zip file, and the same as the val2014, test2014 and test2015, then move it to the dbcollection/coco/data directory.
I think if the directory exist an unzip file, it is better do not check the exist of the *zip file. By this we can use the soft link to save disk space.
You can try to remove the bad file and try running again the code to see if it works. Or, the best thing to do is to manually download all files of the coco dataset into a folder and link it to ~/dbcollection/coco/data
.
Another thing you can do is to download the files into a folder and launch a python/lua terminal and do the following:
import dbcollection as dbc
coco = dbc.load('coco', data_dir='path/to/folder/coco/')
dbc = require 'dbcollection'
coco = dbc.load{name='coco', data_dir='path/to/folder/coco/'}
Then, when you launch the script to train on coco, it should automatically find where the data is stored on disk and proceed to extract/process the detection task metadata. This is a key feature of the dbcollection
package where you only need to set up once the dataset + data files (if the data folder stored in disk is provided by the user) and future uses you won't need to tell where the data is located. All this data is stored in the ~/dbcollection.json
file in your home dir.
Also, if you need to reset the data files for the coco dataset you can do it via the dbcollection
API:
import dbcollection as dbc
dnc.remove(name='coco', delete_data=True)
dbc = require 'dbcollection'
dbc.remove{name='coco', delete_data=true}
PS: Nevertheless, I'll fix this issue when downloading files via the API such that the user doesn't get impacted by an error and has to manually fix the problem.
Wow, all the circumstances have been taken into consideration already. I am not understanding the essence of the dbcollection. I will read the dbcollection API carefully and I'll learn more about it. 👍 Thank you for your thoughtful reply.
I'm really happy that someone else besides me is testing the API. But I must give you a warning: this code is still is its "alpha" stages (if you can call that) regarding documentation and it misses unit testing for the core Python API. But it is usable and it has some kind of basic documentation that it is already provided in the DOCUMENTATION.md
files for each language currently available (i.e., Python, Lua/Torch7, Matlab).
I'm continuing to improve the documentation/tests/API usage and linking my other projects with it, so there may be some bugs here and there (hopefully just a few more).
As always, fell free to ask for help if anything breaks and I'll gladly assist you.
Thank you for you excellent work and prompt reply for my every question. I will follow up the API and test the new feature until its release stage.
I trained and tested with the default alexnet net and voc2007 dataset, everything goes well. I changed the options.lua with coco dataset, and netType with vgg19, some errors occured, maybe the dbcollection package caused the error.
I used this script in scripts/train_test_vgg16_coco.lua, and similar error occured.
thank you in advance.