MhLiao / TextBoxes

TextBoxes: A Fast Text Detector with a Single Deep Neural Network
https://github.com/MhLiao/TextBoxes
Other
633 stars 154 forks source link

Fine tuning on custom dataset: converting to LMDB #99

Open gaalejandre opened 6 years ago

gaalejandre commented 6 years ago

I have a custom dataset which is a modified version of the ICDAR 2013 training dataset for training(also a separate modified ICDAR2013 for testing), so I would like to fine-tune the network using this but I seem a bit confused how to start with converting my images. Suppose, I have the images in _data/oudataset. How will I format my create_data.sh and create_list.sh?

I have read some of the issues that are posted here and most pointed to the examples here: https://github.com/weiliu89/caffe/wiki/Train-SSD-on-custom-dataset Am I right? But I'm still confused to what will be used to recreate the steps.

Please help me. Thank you so much!

MhLiao commented 6 years ago

You should first convert the annotations of your dataset into "xml" files, with a train.txt/test.txt, as shown in the Readme.md. Then, change the corresponding save dirs and dataset lists in the “create_data.sh” and run it.

gaalejandre commented 6 years ago

Thank you so much! I have successfully converted to xml files but there seems to be a problem with my create_data.sh, the line mapfile="$root_dir/examples/TextBoxes/labelmap_voc.prototxt" hits an error(underlined in red) in PyCharm. And when I try to run the _create_data.sh_ (even with the error in pycharm) it gives this error message: Traceback (most recent call last): File "/usr/local/caffe-rc5/TextBoxes/scripts/create_annoset.py", line 7, in <module> from caffe.proto import caffe_pb2 ImportError: No module named caffe.proto

What could be the reason for this? Any help is much appreciated. Thank you!

gaalejandre commented 6 years ago

I have found a fix. There was something wrong with my python path so I edited my ~/.bashrc and replaced my old path:

export PYTHONPATH=/usr/local/caffe-rc5/TextBoxes/caffe/python:$PYTHONPATH

with this new path:

:+1: export PYTHONPATH=/usr/local/caffe-rc5/python:$PYTHONPATH :+1:

and everything went well. The error on Pycharm still exists but the _createdata.sh ran just fine with no errors on the console. Given this:

/usr/bin/env bash /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/create_data.sh
I0731 17:13:31.114806 16647 convert_annoset.cpp:122] A total of 279 images.
I0731 17:13:31.196254 16647 db_lmdb.cpp:35] Opened lmdb /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/lmdb/ou_dataset_train_lmdb
I0731 17:13:47.473220 16647 convert_annoset.cpp:201] Processed 279 files.
/usr/local/caffe-rc5/TextBoxes/build/tools/convert_annoset --anno_type=detection --label_type=xml --label_map_file=/usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset//labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/ /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/train.txt /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/lmdb/ou_dataset_train_lmdb
I0731 17:13:59.587523 16661 convert_annoset.cpp:122] A total of 233 images.
I0731 17:13:59.606801 16661 db_lmdb.cpp:35] Opened lmdb /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/lmdb/ou_dataset_test_lmdb
I0731 17:14:11.952129 16661 convert_annoset.cpp:201] Processed 233 files.
/usr/local/caffe-rc5/TextBoxes/build/tools/convert_annoset --anno_type=detection --label_type=xml --label_map_file=/usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset//labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/ /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/test.txt /usr/local/caffe-rc5/TextBoxes/data/ou_dataset/../../data/ou_dataset/lmdb/ou_dataset_test_lmdb

Process finished with exit code 0

AM I ON THE RIGHT TRACK?

bhargavaurala commented 6 years ago

Hello @gaalejandre,

I did the following steps to create LMDB on a custom dataset. First make sure that your custom dataset is arranged exactly like the VOC folder structure. Pay attention to the train.txt, test.txt etc. and make sure the prefixes etc. are consistent with the VOC format. Then run create_data.sh. I recommend you create a separate folder for your dataset and modify the .sh scripts accordingly. You can refer to my project for more details. Specifically the data/AccessMath folder.

Hope this helps, Cheers.

watermellon2018 commented 3 years ago

Hello @gaalejandre , Can you tell me, please, how you resolved problem with ImportError: No module name caffe.proto I write the script which calling create_annoset.py, but it throw error, also i write in .bashrc this string: export PYTHONPATH=/home/my_name/Desktop/Caffe/python:$PYTHONPATH but it not help me. Could you help me solve this problem, if you remember the solution? Thanks