fizyr / keras-retinanet

Keras implementation of RetinaNet object detection.
Apache License 2.0
4.38k stars 1.96k forks source link

Evaluation produces mAP of 0.0 when using backbone Resnet50 for Stanford Drone Datasets #1351

Closed KulunuGeeganage closed 4 years ago

KulunuGeeganage commented 4 years ago

Dear All, @jpxrc @prickly-u @kolyadin @hgaiser

Thank you for the awesome package keras-retinanet.

Here are the details of my training and evaluation.

Data set (Stanford Drone Dataset) - https://cvgl.stanford.edu/projects/uav_data/

Annotations and train data set took from here - https://drive.google.com/drive/u/0/folders/1QpE_iRDq1hUzYNBXSBSnmfe6SgTYE3J4

Backbone - Default Resnet50

This is my command for training - python train.py --weights resnet50_coco_best_v2.1.0.h5 --steps 400 --image-min-side=224 --image-max-side=224 --batch-size=2 --epochs 20 --snapshot-path snapshots --tensorboard-dir tensorboard csv train_annotations.csv labels.csv

This is my command for evaluation - python evaluate.py --save-path --image-min-side=224 --image-max-side=224 csv train_annotations.csv labels.csv \test1cA.h5

Results

While training

400/400 [==============================] - 580s 1s/step - loss: 1.4676e-08 - regression_loss: 0.0000e+00 - classification_loss: 1.4676e-08 Running network: 100% (2587 of 2587) Parsing annotations: 100% (2587 of 2587) 11749 instances of class Biker with average precision: 0.0000 889 instances of class Car with average precision: 0.0000 101 instances of class Bus with average precision: 0.0000 266 instances of class Cart with average precision: 0.0000 485 instances of class Skater with average precision: 0.0000 22188 instances of class Pedestrian with average precision: 0.0000 mAP: 0.0000

After running evaluate.py

Running network: 100% (2587 of 2587) Parsing annotations: 100% (2587 of 2587) 11749 instances of class Biker with average precision: 0.0000 889 instances of class Car with average precision: 0.0000 101 instances of class Bus with average precision: 0.0000 266 instances of class Cart with average precision: 0.0000 485 instances of class Skater with average precision: 0.0000 22188 instances of class Pedestrian with average precision: 0.0000 Inference time for 2587 images: 0.2283 mAP using the weighted average of precisions among classes: 0.0000 mAP: 0.0000

Could you please give me the reason and solution for this. I tried all the methods mentioned in these threads. But still I'm getting 0.0 for mAP.

Threads - https://github.com/fizyr/keras-retinanet/issues/647 Threads - https://github.com/fizyr/keras-retinanet/issues/1055#issuecomment-523873503

I must be really thankful to you if you kindly give me a solution for this as soon as possible.

Regards, Kulunu.

foghegehog commented 4 years ago

I'm not an author of the package, but can suppose that --image-min-side=224 --image-max-side=224 can be too low resolution for the Standford Drone Dataset, as the objects there are already very small. Have you tried larger sizes (the default 1333x800 to start with)?

pleaseRedo commented 4 years ago

Same issue here. I'm using 224 size as well. Will scale up the size today to see if input size is the problem. I also noticed we both got ridiculously small loss.

(I changed so many source code, including changing resnet50 architecure, really duunno the root of this issue.

jackie-angelswing commented 4 years ago

@pleaseRedo Does enlarging the image-size simply help with the detection?

pleaseRedo commented 4 years ago

@pleaseRedo Does enlarging the image-size simply help with the detection?

would help, but it's up to what type of data you are dealing with. I'm trying to detect tiny object(polyps), having larger img_size benefits a lot. with from mAP 0.87 to 0.94

jackie-angelswing commented 4 years ago

@pleaseRedo Thank you for the quick reply. I am only detecting a small target(25x25) within a large image(5500x3000). Should I try anchor optimization? I'm getting mAP 0.000 atm.

pleaseRedo commented 4 years ago

@pleaseRedo Thank you for the quick reply. I am only detecting a small target(25x25) within a large image(5500x3000). Should I try anchor optimization? I'm getting mAP 0.000 atm.

You should, default anchor config does not expect your input to be that large

KulunuGeeganage commented 4 years ago

Hi @pleaseRedo

Thank you for sharing your experience and observations.

Are you trying with the same data set (Stanford Drone Dataset) ? Did you get any solution for this ?

I can't proceed training process with large resolutions like 1333x800. Once I assign image min and max side more than 224 it can't handle the training process and stop with memory exhaust error. I think my GPU can't handle such large resolutions. Do you know any solution for this ?

So now I'm trying with Swimming Pool And Car Detection data set here. That has image resolution of (224 x 224) - https://www.kaggle.com/kbhartiya83/swimming-pool-and-car-detection

Now my results are like this-

2357 instances of class 1 (Car) with average precision: 0.0000 678 instances of class 2 (Swimming Pool) with average precision: 0.7443 Inference time for 749 images: 0.2592 mAP using the weighted average of precisions among classes: 0.1663 mAP: 0.3721

But it can't calculate average precision for class 1 (Cars). Why is that ? When I run this evaluation I got saved images like this-

18 26

Why it can't show value for class 1 although it detect properly ?

Dear @prickly-u do you have any idea about this ?

Regards, Kulunu.

pleaseRedo commented 4 years ago

Hi @KulunuGeeganage, Sorry I'm working on my own datasets. If your gpu cannot hold large image even you lower the batch-size. One way i would do is go to resnet50 source code and half the final output dimension to 1024(if you believe large image resolution offers more benefits)

Why it can't show value for class 1 although it detect properly ?

From your result, I don't see any detection result of class 1. Those green box are gt label right?

From my experience, class 1 requires smaller anchor size for regression. You better do the anchor optimisation next and see if issue solved.

KulunuGeeganage commented 4 years ago

Hi @pleaseRedo

Thank you for the quick reply. Sorry I'm new to retinanet and could you please clarify me following.

What did you mean by resnet50 source code ? Is that resnet.py in keras-retinanet, models directory ? Can you guide me to how to half the output dimension to 1024 ?

To smaller anchor size is that correct only to change following line in anchors.py ?

AnchorParameters.default = AnchorParameters( sizes = [32, 64, 128, 256, 512], strides = [8, 16, 32, 64, 128], ratios = np.array([1], keras.backend.floatx()),

Is that the only change should I do before training ? Or else should I do when I run the evaluation ?

Regards, Kulunu.

pleaseRedo commented 4 years ago

To change architecuter using: print (os.path.abspath(inspect.getfile(keras_resnet))) to find resnet source location, it contains egg file where the actuall resnet is implemented there. Make a copy of egg for backup You're changing source code now. Locate file \blocks\ _2d.py Inside function bottleneck_2d , it is used to define resblock for resnet50 I don't know the exact line cuz I changed this file a lot. Just find this line y = keras.layers.Conv2D(filters * 4, (1, 1), use_bias=False, name="res{}{}_branch2c".format(stage_char, block_char), **parameters)(y) You would find 2 of this. Above this line add this condition if stage_char == '5': y = keras.layers.Conv2D(filters * 2, (1, 1), use_bias=False, name="res{}{}_branch2c".format(stage_char, block_char), **parameters)(y)

else: y = keras.layers.Conv2D(filters * 4, (1, 1), use_bias=False, name="res{}{}_branch2c".format(stage_char, block_char), **parameters)(y)

The only change is filters 4 become filters2. So the last blocks and identity shortcut don't map to 2048 dimension but to 1024. Better take a look at resnet50 architecure first to understand what I'm saying. https://datascience.stackexchange.com/questions/33022/how-to-interpert-resnet50-layer-types/47489

There is a anchor optimisation section in this repo, check that out. That's pretty simple.

Hi @pleaseRedo

Thank you for the quick reply. Sorry I'm new to retinanet and could you please clarify me following.

What did you mean by resnet50 source code ? Is that resnet.py in keras-retinanet, models directory ? Can you guide me to how to half the output dimension to 1024 ?

To smaller anchor size is that correct only to change following line in anchors.py ?

AnchorParameters.default = AnchorParameters( sizes = [32, 64, 128, 256, 512], strides = [8, 16, 32, 64, 128], ratios = np.array([1], keras.backend.floatx()),

Is that the only change should I do before training ? Or else should I do when I run the evaluation ?

Regards, Kulunu.

kunnareekr commented 4 years ago

I cannot help to answer, but I want to share my yesterday experiment AP and mAP =0.00 In my case, may be too small number of images and step size.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale due to the lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.