bennycheung / Food100_YOLO_Tools

Python tools and configuration files for Food100 dataset DarkNet YOLO training
56 stars 22 forks source link

This is the set of tools and configurations used by the YOLO Real-Time Food Detection article at http://bennycheung.github.io/yolo-for-real-time-food-detection

Food100 YOLO Training Tools

The following instructions concentrated on describing YOLO v2 setup and training. To get DarkNet YOLO training to work, we needs

Here is my github <> of tools to help, check them out if in doubt! It also contains my configuration .cfg, class name .names and data location .data files described later. They are designed or modified to working with DarkNet requirement for bounding box and training data.

Object Bounding Box File

After downloaded and unpacked the Food100 dataset UEC FOOD 100, it requires post processing to make bounding box that fit into DarkNet's YOLO training requirements.

The Food100 classes are like,

food_id name
1 rice
2 eels_on_rice
3 pilaf
4 chicken-n-egg_on_rice
5 pork_cutlet_on_rice
6 beef_curry
7 sushi
... up to 100 classes

Bounding Box Class Number

DarkNet YOLO (both v2 and v3) expected the class number is 0 based, that the food_id-1 to get the class number

<object-class-id> <center-X> <center-Y> <width> <height>

Originally, I forgot to subtract 1 for the ; that's make me think the method is not working and almost gives up! After re-align to 0 based , the detection shows correct results.

Bounding Box Description File

DarkNet YOLO expected a bounding box .txt-file for each .jpg-image-file - in the same directory and with the same name, but with .txt-extension, and put to file: object number and object coordinates on this image.

If we named our food100 image directory as images, then DarkNet will automatically look for the corresponding .txt in labels directory. For example, images/1/2.jpg will look for corresponding label in labels/1/2.txt file. I like this approach because we can keep the editing bbox in images/1/2.txt, while the labels/1/2.txt is the bbox format required for YOLO training.

Bounding Box Coordinate in Image

For each food class directory, there is a bb_info.txt that contains all the bbox for every image files. The original bbox is specified as,

<image-number> <top-left-X> <top-left-Y> <bottom-right-X> <bottom-right-Y>

However, YOLO expected each .jpg-image-file has a corresponding bbox description .txt-extension file. The bbox description file is specified as,

<object-class-id> <center-X> <center-Y> <width> <height>

Data Location File

The data file to tell where to find the training and validation paths, food100.data

classes = 100
train = /Users/bcheung/dev/ML/darknet/data/food100/train.txt
valid = /Users/bcheung/dev/ML/darknet/data/food100/test.txt
names = /Users/bcheung/dev/ML/darknet/data/food100/food100.names
backup = backup

Obviously, change the data to your specific data locations.

Traning Dataset File

The train.txt file list of all training images, one image per line. For example,

/Users/bcheung/dev/ML/darknet/data/food100/images/61/6164.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6170.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6158.jpg
...

There should be no image coming from the validating dataset.

Validating Dataset File

The test.txt file list of all validating images, one image per line. For example,

/Users/bcheung/dev/ML/darknet/data/food100/images/61/6990.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6149.jpg
/Users/bcheung/dev/ML/darknet/data/food100/images/61/6099.jpg
...

There should be no image coming from the training dataset.

Class Name File

The class names specified in, food100.names Subsequently, the class id is 0-based, i.e. rice is class 0. There is no class-id, just names.

rice
eels-on-rice
pilaf
chicken-n-egg-on-rice
pork-cutlet-on rice
beef-curry
sushi
...

Configuration File

We need to create a configuration file yolov2-food100.cfg.

You can copy darknet cfg/yolov2-voc.cfg and make modifications

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=8
...

The meaning of batch and subdivisions is how many mini batches you split your batch in.

That will be repeated 8 times until the batch is completed and a new itereation will start with 64 new images. When batching you are averaging over more images the intend is not only to speed up the training process but also to generalise the training more. If your GPU have enough memory you can reduce the subdivision to load more images into the gpu to process at the same time.

Further editing to the configuration file classes and filters specifications,

# number of filters calculated by (#-of-classes + 5)*5
# e.g. (100 + 5)*5 = 525
edit to line 237: filters=525
edit to line 244: classes=100