Yolo-v2 (VOC+COCO) - Githubissues

MyVanitar commented 6 years ago

Hello,

When I checked the VOC evaluation results, there is a result of Yolo-v2 (VOC+COCO) which has reached above 81mAP. Where we can find some information about it?

2018-01-30_23-45-02

AlexeyAB commented 6 years ago

Hi,

All informations only here is: http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4#KEY_YOLOv2 (VOC + COCO)

YOLOv2 (VOC + COCO) YOLOv2 (VOC + COCO) University of Washington Joseph Redmon, Ali Farhadi We use a variety of tricks to increase the performance of YOLO including dimension cluster priors and multi-scale training. Details at https://pjreddie.com/yolo/ 2017-10-21 18:07:57

There are no additional scientific articles explaining how the accuracy was improved in comparison with the previous Yolo v2.0 - 78.6 mAP: https://pjreddie.com/darknet/yolo/

I think there are several improvements, such as:

keep aspect ratio
training dataset includes VOC+COCO instead of only VOC
introduced parameter burn_in: https://github.com/pjreddie/darknet/blob/56be49aa4854b81b855f6a9daffce4b4ad1fbb9e/src/network.c#L93

Last Joseph's article is not about Yolo "IQA: Visual Question Answering in Interactive Environments": https://pjreddie.com/publications/

MyVanitar commented 6 years ago

is it any method to increase the training images without handy annotating them?

I know the Darknet adds some noise or colors and makes vast number of training images in the memory, but I want to know if there is a code to make extra physical training images (by adding rotations, scales, brightness ...) and change the annotations automatically (in case or rotation or scale changes).

By this method we can make many hundreds of images from just for example 100 images.

phongnhhn92 commented 6 years ago

@VanitarNordic my friend, you can try shifting image to the left right up down a few pixels and easy change the bounding box coordinates. Other techniques such as rotations, scaling, ... will massively change the appearance of the object inside the image, there is no way any we can guess the changes from the original box with a interpolated one. I has this problem before and I have to manually do it.

TheMikeyR commented 6 years ago

@VanitarNordic I've tried to increase my dataset 3x by rotating the images (90, 180 and 270 degrees), together with the annotations. If you are familiar with python you can use this function to do it. Input is the image, a list with all bounding boxes for the image and which method (I've used this variable so I can easily spawn 3 parallel processing using multiprocessing library and have everything process faster. I don't know how much better the training gets to feed these rotated images, if you figure it out, feel free to share!

def rotate_image(im, bboxes, method=None):
    if not method:
        return im, bboxes
    # Turn 180 degree
    if method == 1:
        for element in bboxes:
            element[1] = 1. - element[1]
            element[2] = 1. - element[2]
        im = np.array(im[::-1, ::-1, :])
        return im, bboxes
    # Turn 90 degree
    elif method == 2:
        for element in bboxes:
            element[1] = 1. - element[1]
        im = np.array(im[::, ::-1, :])
        return im, bboxes
    # Turn 270 degree
    elif method == 3:
        for element in bboxes:
            element[2] = 1. - element[2]
        im = np.array(im[::-1, ::, :])
        return im, bboxes

MyVanitar commented 6 years ago

@TheMikeyR

thank you very much. actually I was thinking to write a python code to crop the bounding box(es), and rotate/scale it(them) inside the image and save as a new image.

TheMikeyR commented 6 years ago

@VanitarNordic seems like a great idea, there are many different libraries for image augmentation, e.g. imgaug, but as you mentioned these are not including the detections, since they are for networks like faster-rcnn where you train with the cropped object instead, where yolo is training with the box location and full image. Would be nice with a yolo-alternative, I don't have time to look into it now, but maybe in the future. If you make something, feel free to share :+1:

AlexeyAB commented 6 years ago

@VanitarNordic @phongnhhn92 @TheMikeyR

Someone offered a solution to this problem. I did not try it, but you can test it: https://groups.google.com/forum/#!searchin/darknet/rotation%7Csort:date/darknet/DPxhZcC0x2k/NBvD06urAwAJ He said:

I trained this with COCO with/without pre-trained model yesterday, but still, the final weights is pretty bad for detection. Although the weights trained without pre-trained model is better.

He made hardcoded rotation on 90 degree. But I change this code to the random rotation in range(-angle, +angle) that set in the cfg-file:

you should set angle from 0 to 180: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/cfg/yolo-voc.2.0.cfg#L9
You should change this line: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L744 to this: *a.d = load_data_detection(a.n, a.paths, a.m, a.w, a.h, a.num_boxes, a.classes, a.jitter, a.hue, a.saturation, a.exposure, a.small_object, a.angle);
change this line: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.h#L86 to this: data load_data_detection(int n, char **paths, int m, int w, int h, int boxes, int classes, float jitter, float hue, float saturation, float exposure, int small_object, float angle);
change these 2 functions:
- https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L671
- https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/data.c#L295

to these 3 functions:

/***** rotate truth boxes 90 degree *****/
void fill_truth_detection(char *path, int num_boxes, float *truth, int classes, int flip, float dx, float dy, float sx, float sy, image im, float rad)
{   
    float tx, ty;
    char labelpath[4096];
    find_replace(path, "images", "labels", labelpath);
    find_replace(labelpath, "JPEGImages", "labels", labelpath);

    find_replace(labelpath, "raw", "labels", labelpath);
    find_replace(labelpath, ".jpg", ".txt", labelpath);
    find_replace(labelpath, ".png", ".txt", labelpath);
    find_replace(labelpath, ".JPG", ".txt", labelpath);
    find_replace(labelpath, ".JPEG", ".txt", labelpath);
    int count = 0;
    float x,y,w,h;
    int id;
    int i;
    //float rad = TWO_PI/4;
    box_label *boxes = read_boxes(labelpath, &count);
    randomize_boxes(boxes, count);

    if(count > num_boxes) count = num_boxes;
    for (i = 0; i < count; ++i) {
        tx = boxes[i].x * im.h;
        ty = boxes[i].y * im.w;
        x = (cos(rad)*(tx-im.h/2) - sin(rad)*(ty-im.w/2) + im.w/2)/im.w;
        y = (sin(rad)*(tx-im.h/2) + cos(rad)*(ty-im.w/2) + im.h/2)/im.h;
        boxes[i].x = x;
        boxes[i].y = y;

        w =  boxes[i].h;
        h =  boxes[i].w;
        boxes[i].w = w;
        boxes[i].h = h; 

        boxes[i].left   = x - w/2;
        boxes[i].right  = x + w/2;
        boxes[i].top    = y - h/2;
        boxes[i].bottom = y + h/2; 

    }
    correct_boxes(boxes, count, dx, dy, sx, sy, flip);

    for (i = 0; i < count; ++i) {
        x =  boxes[i].x;
        y =  boxes[i].y;
        w =  boxes[i].w;
        h =  boxes[i].h;
        id = boxes[i].id;

        if ((w < .001 || h < .001)) continue;

        truth[i*5+0] = x;
        truth[i*5+1] = y;
        truth[i*5+2] = w;
        truth[i*5+3] = h;
        truth[i*5+4] = id; draw_box_width(im,  boxes[i].x* im.w- boxes[i].w*im.w/2,  boxes[i].y*im.h- boxes[i].h*im.h/2,  boxes[i].x*im.w+ boxes[i].w*im.w/2,  boxes[i].y*im.h+ boxes[i].h*im.h/2, 4, 0.1, 0.4, 0.6);
        save_image(im,"draw");

    }
    free(boxes);
}

/***** load rotated original images******/
data load_data_detection(int n, char **paths, int m, int w, int h, int boxes, int classes, float jitter, float hue, float saturation, float exposure, float angle)
{
    char **random_paths = get_random_paths(paths, n, m);
    int i;
    data d = {0};
    d.shallow = 0;

    d.X.rows = n;
    d.X.vals = calloc(d.X.rows, sizeof(float*));
    d.X.cols = h*w*3;

    d.y = make_matrix(n, 5*boxes);
    for(i = 0; i < n; ++i){
        image orig0 = load_image_color(random_paths[i], 0, 0);
        float random_angle = rand_uniform(angle, angle);
        float random_angle_rad = TWO_PI*random_angle/360.0; //  degree to radian
        image orig = rotate_image_r(orig0, random_angle_rad);
        image sized = make_image(w, h, orig.c);
        fill_image(sized, .5);

        float dw = jitter * orig.w;
        float dh = jitter * orig.h;

        float new_ar = (orig.w + rand_uniform(-dw, dw)) / (orig.h + rand_uniform(-dh, dh));
        float scale = rand_uniform(.25, 2);

        float nw, nh;

        if(new_ar < 1){
            nh = scale * h;
            nw = nh * new_ar;
        } else {
            nw = scale * w;
            nh = nw / new_ar;
        }

        float dx = rand_uniform(0, w - nw);
        float dy = rand_uniform(0, h - nh);

        place_image(orig, nw, nh, dx, dy, sized);

        random_distort_image(sized, hue, saturation, exposure);
        int flip = rand()%2;
        if(flip) flip_image(sized);
        d.X.vals[i] = sized.data;

        //fill_truth_detection(random_paths[i], boxes, d.y.vals[i], classes, flip, -dx/w, -dy/h, nw/w, nh/h,sized);
        fill_truth_detection(random_paths[i], boxes, d.y.vals[i], classes, flip, -dx/w, -dy/h, nw/w, nh/h,orig, random_angle_rad);

        free_image(orig);
        free_image(orig0);
    }
    free(random_paths);
    return d;
}

// rotate image by 90 degree
image rotate_image_r(image im, float rad)
{
    int x, y, c;
    float cx = im.w/2.;
    float cy = im.h/2.;
    image rot = make_image(im.h, im.w, im.c);
    for(c = 0; c < im.c; ++c){
        for(y = 0; y < im.h; ++y){
            for(x = 0; x < im.w; ++x){
                float rx = cos(rad)*(x-cx) - sin(rad)*(y-cy) + cy;
                float ry = sin(rad)*(x-cx) + cos(rad)*(y-cy) + cx;
                float val = bilinear_interpolate(im, x, y, c);
                set_pixel(rot, rx, ry, c, val);
            }
        }
    }
    return rot;
}

MyVanitar commented 6 years ago

@AlexeyAB

Hi Alex, thank you for sharing the code.

As I see the code modifies the Darknet code in the training section and creates custom images in the memory. I'll test but a question pops up here, is it better than we make these variety of images on the hard disk or make them float in the memory?

AlexeyAB commented 6 years ago

@VanitarNordic Hi, I think the result should be the same.

MyVanitar commented 6 years ago

@AlexeyAB

Another question remains.

if we rotate or scale just the bounding boxes, then the empty space in rotation or scaling will be filled by the black color I think by default. then the model does not "think" that remained background pieces in the bounding boxes are part of the object?

I think it could happen and we should rotate or scale the whole image not just bounding boxes. What do you think?

That might be the reason why he gets bad results by his code

AlexeyAB commented 6 years ago

@VanitarNordic In this code the whole image is rotated around its center (not only bounded boxes). And then bounded boxes are rotated in the same way. He said, that he got bad result before this fix: https://groups.google.com/forum/#!msg/darknet/DPxhZcC0x2k/ZUHiKDL_AwAJ

And yes, even if we rotated whole image, then the angles or edges of the image will be black, but with this the neural network meets with black edges even without rotation, for example, during the using of the padding - black (value 0) at the edges: https://github.com/AlexeyAB/darknet/blob/51d99f5903719da27344d0a29b091d3b035953cb/src/im2col.c#L6-L10 And as we see, it works successfully with padding.

MyVanitar commented 6 years ago

@AlexeyAB

Yes, when we pad, rotate or scale the Whole Image, then no problem. My comment was about if we do these operations just on bounding boxes.

Thanks.

TheMikeyR commented 6 years ago

Thanks for the share, detailed instructions and your time @AlexeyAB :+1:

MyVanitar commented 6 years ago

@AlexeyAB

I faced two errors when I was trying to compile with the new codes:

image orig = rotate_image_r(orig0, random_angle_rad); image rotate_image_r(image im, float rad)

2018-02-12_22-13-02

MyVanitar commented 6 years ago

@AlexeyAB

I moved the image rotate_image_r(image im, float rad) to the up of the data load_data_detection function and commented the place_image(orig, nw, nh, dx, dy, sized); function to remove the errors.

The code compiled and I started training. Besides the function does not read the angle value from the cfg file and it is always zero. (you can verify it by a printf test) . Therefore I defined the range from -180 to 180 inside the function itself.

The results got worse. Most likely there is a bug in the code. I'll try to do this by writing a python code and do the rotation operation of real images on the hard disk and keep the Darknet code intact.

AlexeyAB / darknet

Yolo-v2 (VOC+COCO) #362