fizyr / keras-retinanet

Keras implementation of RetinaNet object detection.
Apache License 2.0
4.37k stars 1.96k forks source link

Poor validation, is CSV generator working correctly ? #500

Closed metal3d closed 6 years ago

metal3d commented 6 years ago

Hello, I tried to make a "licenceplate" detector to get regions with retinanet and I annotated images to get that (don't trust w and h, that are not width and height but x2 and y2 pixel coords):

image

So I saved CSV and get this format:

licenceplates/040603/P1010001.jpg,209,214,429,257,licenseplate

Of course there are several other images in CSV, that's just a check !

And of course, classes.csv file is:

licenseplate,1

Then:

batch_size=16
image_side = 800
train_generator = csv_generator.CSVGenerator(
    'out/train.csv',
    csv_class_file='out/classes.csv',
    base_dir='.',
    batch_size=batch_size,
    image_max_side=image_side)
validation_generator = csv_generator.CSVGenerator(
    'out/validation.csv',
    csv_class_file='out/classes.csv', 
    base_dir='.',
    batch_size=batch_size,
    image_max_side=image_side)

epochs=100
ts = train_ds.shape[0] // batch_size
vs = validation_ds.shape[0] // batch_size
model = resnet.resnet_retinanet(2)
# model = vgg_retinanet(2)
model.compile(
    loss={
        'regression'    : losses.smooth_l1(),
        'classification': losses.focal()
    },
    optimizer=keras.optimizers.adam(lr=1e-5, clipnorm=0.001)
)
model.fit_generator(train_generator,
                    epochs=epochs,
                    verbose=1,
                    steps_per_epoch=ts,
                    validation_data=validation_generator,
                    validation_steps = vs)

I've got 0.4 in validation loss after 100 epochs, and boxes are not correct (that detects something else than licenseplates, as sun reflection, line on road, and so on)

Examples:

image

image

So I wonder:

That should help me to understand what's wrong :)

hgaiser commented 6 years ago

Your assumptions are all correct. You could try using debug.py to visualize your annotations to verify they make sense. Also, what is the code you used to visualize your annotations? And what is your training loss. A 0.4 validation loss seems okay to me.

Additionally, we use 0-based labels. Right now you're training a network for two labels, one of which is unnamed. While not really an issue probably, it's better to label license plate as 0.

metal3d commented 6 years ago

Hi, thanks for your answer. I'm using the code presented in the example playbook to visualize predictions. And yes, I actually understood that a "unknown" class is used to classify "other things". I will retrain with "0" for licenceplate class.

The used function to make prediction taken from the example notebook:

model = models.load_model('./licence-inf.h5', backbone_name='resnet50')
labels_to_names = {0: 'nothing', 1: 'licenceplate'}
def pred(model, fname):    
    # load image
    image = read_image_bgr(fname)

    # copy to draw on
    draw = image.copy()
    draw = cv2.cvtColor(draw, cv2.COLOR_BGR2RGB)

    # preprocess image for network
    image = preprocess_image(image)
    image, scale = resize_image(image)

    # process image
    start = time.time()
    boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
    print("processing time: ", time.time() - start)

    # correct for image scale
    boxes /= scale

    # visualize detections
    for box, score, label in zip(boxes[0], scores[0], labels[0]):
        # scores are sorted so we can break
        if score < 0.4:
            break

        color = label_color(label)

        b = box.astype(int)
        draw_box(draw, b, color=color)

        caption = "{} {:.3f}".format(labels_to_names[label], score)
        draw_caption(draw, b, caption)

    plt.figure(figsize=(15, 15))
    plt.axis('off')
    plt.imshow(draw)
    plt.show()

import glob
for f in glob.glob('./out/licenceplates/280503/*.jpg')[:10]:
    pred(model, f)

I trained the model as explained and I used keras_retinanet/bin/convert_model.py to convert trained model to inference model.

I will use debug.py to check. But I really don't understand why I have absolutly noting good as inference. What is weird is that you can see that the found region seems to have the good ratio, but it's like it's not well placed.

I will take a look on debug.py, maybe it will help.

metal3d commented 6 years ago

Ok, I checked with debug.py and it seems to be ok.

I changed train/valid ratio, and I did more train epochs. I also set class index for licenceplate to "0" and changed the dictionnary to map crorect label.

I now have

Then I tried with some image used on validation, and I've got following results (no licenceplate, but it detects other thing as licenceplates...)

image

image

Sometimes it's almost ok...

image

I maybe haven't got enough data to train...

Thanks a lot (I can close issue)

metal3d commented 6 years ago

Wow !!! I think I found the problem ! In the predict function, I didn't set the "max_side" option on resize_image call, I set it up to the one used for csv_generator. And now:

image

image

Maybe a note on documentation can help !

hgaiser commented 6 years ago

Yep I think that was the problem. I'm assuming most of your license plates or more or less equally big, so only those specific parts of the network are trained to detect license plates. In essence, it is learning to detect a certain size of license plates. Without the proper resizing during inference, you won't gete a good result (quite surprised it detected some things at all actually). I'll close this issue then.

JinwenJay commented 6 years ago

I think I get the same problem, and I also thought it was caused by lack of training data. Could you tell me how many training data did you use?

metal3d commented 6 years ago

I used 2000 images, but it's now resolved. I needed to resize image before to make inferences.

nuwanjkit commented 6 years ago

@metal3d Hi, can you briefly explain how did you prepare the custom dataset. I'm trying a lot to prepare my own csv dataset. Big help :-)

metal3d commented 6 years ago

@nuwanjkit I'm creating a web app named "imannotate" that prepares annotations. It returns a basic CSV containing image,label,coords. (https://github.com/smileinnovation/imannotate => devel branch)

Afterward, I only "download" each image and change "relative coords" to "pixel coords", and I'm using the code that I gave in my first comment.

When I want to make prediction, I'm using the code in my second comment with "max_side" argument in "resize_image" function call corresponding to the "max_side" that I gave during the train process.

I'm writing a Medium article to work with imannotate tool, and maybe a Medium to treat CSV, train and predict on images ;)

nuwanjkit commented 6 years ago

@metal3d Thanks a lot sir. I'll try that. If you can write the article describing every step briefly, it would be highly appreciated. Thanks, have a great day :-)

nuwanjkit commented 6 years ago

@metal3d I trained the model successfully by running the train.py and when i do the predictions it runs successfully but doesn't give any results. Below image shows my output. Any idea :-)

processing time: 10.187751770019531 labels: [[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]] scores: [[-1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.]]

hgaiser commented 6 years ago

I think Slack will be better suited to have these discussions if you don't mind :)