Evaluation always produces mAP of 0.0 when using backbones other than Resnet50

jpxrc commented 6 years ago

First and foremost, thank you for the awesome package! The dataset I am using is of satellite images consisting of 29 different classes. I have been able to train and evaluate a retinanet model on this dataset using the default 'densenet50' backbone on a subset of the 29 classes.

However, when I switch over to training and evaluating a model with a different backbone network such as 'densenet121', all of the mAP scores for each class is zero. I'm not receiving any issues when training (I am also using the random-transform flag for each epoch), or when converting the model (I also supply the --backbone='densenet121' flag) and it converts successfully. I can also see that losses being optimized during training so it's definitely detecting and classifying the objects in the images.

I even tried using the original resnet50 model trained on a subset of classes to see if it would pick up those classes on the full dataset with 29 classes and it still produces an output of zero. I looked at the validation_annotations.csv file for both cases and the formatting is identical so I don't think it has to do with the annotation files.

I have attached the validation_annotations.csv file, the classes.csv file (converted to .txt files in order to attach them here) common_classes.txt common_validation_annotations.txt

Any ideas what could be going on?

EDIT: I just did a comparison of a Resnet50 model and Densenet121 model both trained on the same dataset that I know for sure works and the problem is definetely with the densenet121 implementation because the Resnet50 model is producing output during evaluation.

hgaiser commented 6 years ago

Densenet is a community contribution and I have never really used it. If you find out what the problem is then a PR will be welcome :)

birham-red-bd commented 6 years ago

I have the same problem when i changed the backbone to the shuffleNet. The loss is decreasing, but the MAP is always zero. However when i trained model with resnet50 backbone, everything is okay. I still did not find the problem. Can anyone give me some advice?

hgaiser commented 6 years ago

Do you evaluate during training, or after training using the evaluate tool? For densenet and mobilenet I noticed there is a bug when preprocessing the images in the evaluate tool. Those backbones require a mode='tf' in the preprocess_image call which isn't there currently.

birham-red-bd commented 6 years ago

yeah, I evaluate the map during training. The dataset i used is pascal voc2007. And I did not use pretrained model, because i did not find the pretrained weights of shufflenet-V2 on imagenet.

hgaiser commented 6 years ago

@songhuizhong I'm not sure what you mean with shufflenet-V2, we don't have that backbone in our repository.

birham-red-bd commented 6 years ago

It is the model I built by myself. here is the article introducing the shufflenet-v2 :https://arxiv.org/abs/1807.11164

hgaiser commented 6 years ago

In that case I can't help. I only have experience with ResNet backbones. If you find out the solution to this then a PR is welcome.

birham-red-bd commented 6 years ago

well, i find the problem, it is my fault, the reason why map is 0 is because the feature map input into feature pyramid network is in wrong order. This is the wrong one. layer_names = ['1x1conv5_out','stage4/block1/relu_1x1conv_1','stage3/block1/relu_1x1conv_1'] and this is the correct one. layer_names = ['stage3/block1/relu_1x1conv_1','stage4/block1/relu_1x1conv_1','1x1conv5_out'] But I can only trained this model with the highest mAP of 44.59% on dataset of VOC2007 test. The training dataset is voc2012 trainval. The input size is changed to 512*512, because i want to use larger batch size of 28. And i trained this model without using pretrained weights on imageNet.

hgaiser commented 5 years ago

Alright, I'll assume this issue resolved then.

hgaiser commented 5 years ago

Actually, the original issue was something else entirely, DensNet, so I'm reopening..

jyu-theartofml commented 5 years ago

I ran into the same issue using resnet50 for backbone (training and using evaluation.py). I fixed it when I specify image-min-side and image-max-side, otherwise the default values of these parse args don't match up with my image dimension (256x256).

MAGI003769 commented 5 years ago

Above all thanks for your awesome work @hgaiser. I met the same problem with @yyannikb . With 'densenet121' as backbone, I got massive detection results and all the scores of boxes are 1. Yes, there is only such a value for score. Concequently, it leads a zero mAP.

My project uses mammography from dataset DDSM. For training, I set image_max_side=666 and image_min_side=400, start with pretrained model from keras repo and learning rate equal to 0.003. Has anyone resolved this problem? I will appreciate that if you can share your experience. Thanks a lot.

wxinbeings commented 5 years ago

When I used mobilenet224 to train, got the same issue. Has anyone resolved this problem? I will appreciate that if you can share your experience. Thanks a lot.

ozyilmaz commented 5 years ago

Do you evaluate during training, or after training using the evaluate tool? For densenet and mobilenet I noticed there is a bug when preprocessing the images in the evaluate tool. Those backbones require a mode='tf' in the preprocess_image call which isn't there currently.

This was the solution for mobilenet. You have to hack the 'evaluate_coco' method in coco_eval.py. Change this line: image = generator.preprocess_image(image) To this: image = generator.preprocess_image(image, mode='tf')

tommysugiarto commented 5 years ago

Hi I also got the same problem when tried to inference model with densenet121 backbone, so someone already have idea how to solve that? Thanks a lot

wxinbeings commented 5 years ago

Hi I also got the same problem when tried to inference model with densenet121 backbone, so someone already have idea how to solve that? Thanks a lot

It's the same issue with mobilenet, just change the same place as @ozyilmaz commented.

tonmoyborah commented 5 years ago

@ozyilmaz when I do this, it throws an error that preprocess_image got an unexpected keyword mode. Is it only me or there is an obvious step I am missing?

ozyilmaz commented 5 years ago

@ozyilmaz when I do this, it throws an error that preprocess_image got an unexpected keyword mode. Is it only me or there is an obvious step I am missing?

@tonmoyborah , it is hard to guess but it seems like the generator object does not have the correct "preprocess_image" method.

Cospel commented 5 years ago

Same happened to me when using mobilenetv1/v2. However when I set that image has fixed 800x800 size input for training and evaluation than the convergence works great. If I changed it to any other resolution like 801x801 than the model does not converge. If i set the input to the mobilenet inputs = keras.layers.Input(shape=(size, size, 3)) then the model works as expected (and not None, None, 3).

Anyone could explain this strange behaviour?

IntelligentIndia7 commented 5 years ago

@tonmoyborah @ozyilmaz Hey, Did you solve the error !!! I am trying to run RetinaNet with Mobilenet224_1.0 backbone and I got map of 0. When I try and train with change in eval.py -> _get_detections method in the line - image = generator.preprocess_image(image) to image = generator.preprocess_image(image, mode = 'tf') as mentioned by @ozyilmaz . I get same error as unexpected keyword mode

IntelligentIndia7 commented 5 years ago

when I train normally I get 0 mAP as shown below. Can anyone help me on this?

10000/10000 [==============================] - 3158s 316ms/step - loss: 5.4151 - regression_loss: 2.5757 - classification_loss: 2.8393 - val_loss: 5.3902 - val_regression_loss: 2.5642 - val_classification_loss: 2.8260 Running network: 100% (12704 of 12704) |#################################################################| Elapsed Time: 0:16:48 Time: 0:16:48 Parsing annotations: 100% (12704 of 12704) |#############################################################| Elapsed Time: 0:00:00 Time: 0:00:00 ('6066 instances of class', 'M', 'with average precision: 0.0000') ('8803 instances of class', 'W', 'with average precision: 0.0000') mAP: 0.0000

hgaiser commented 5 years ago

Please use the keras-retinanet slack channel for usage questions, or read the readme to find out possible issues.

thusinh1969 commented 5 years ago

I have same issue. Backbone densenet201 downloaded weight from keras github. Training with freeze-backbone, custom csv which worked well with any Tensorflow object detection model. Batch 16, dataset 27,000, one single class.

Up to epoch 3 (—> 81,000 iters), retinanet-evaluate produce NO predicted boundingbox on any of 3000 evaluation images! Can someone help... please.

Just to add, MobileNet224_2 also does NOT provide any mAP at all. Very tiring ... :( ...

Steve

liminghuiv commented 5 years ago

For mobilenet, I saw keras-retinanet is used in vehicle detection: https://github.com/yangliupku/retinanet_detection Can someone merge it? @hgaiser ?

mariaculman18 commented 5 years ago

Dear all, I came upon the same issue with DenseNet-121. While training, mAP is estimated right but when using retinanet-evaluate it is just 0. I know this is a community to contribute, I would like to help resolve this but I am just a beginner. So, does anyone got a way around this? I am using my own csv dataset for training.

hgaiser commented 5 years ago

Dear all, I came upon the same issue with DenseNet-121. While training, mAP is estimated right but when using retinanet-evaluate it is just 0. I know this is a community to contribute, I would like to help resolve this but I am just a beginner. So, does anyone got a way around this? I am using my own csv dataset for training.

The reply below is still valid:

Densenet is a community contribution and I have never really used it. If you find out what the problem is then a PR will be welcome :)

You could use resnet50, that should work.

hgaiser commented 5 years ago

Could the issue be related to this?

mariaculman18 commented 5 years ago

Dear all, I came upon the same issue with DenseNet-121. While training, mAP is estimated right but when using retinanet-evaluate it is just 0. I know this is a community to contribute, I would like to help resolve this but I am just a beginner. So, does anyone got a way around this? I am using my own csv dataset for training.

The reply below is still valid:

Densenet is a community contribution and I have never really used it. If you find out what the problem is then a PR will be welcome :)

You could use resnet50, that should work.

Yes, I already use ResNet-50 as the backbone and wanted to make a comparison with DenseNet-121.

uttaransinha commented 5 years ago

Hi, I've looked into this problem in detail and it seems like the problem lies in the model itself. Any backbone other than ResNet-50 predicts box co-ordinates as -1, labels as -1 and scores as -1. In other words, the model cannot predict anything at all. But since the training phase of the model goes smoothly, I suspect that the conversion of a trained model to an inference model is bugged. Please take a look at the inference conversion code.

mariaculman18 commented 5 years ago

Hi, I've looked into this problem in detail and it seems like the problem lies in the model itself. Any backbone other than ResNet-50 predicts box co-ordinates as -1, labels as -1 and scores as -1. In other words, the model cannot predict anything at all. But since the training phase of the model goes smoothly, I suspect that the conversion of a trained model to an inference model is bugged. Please take a look at the inference conversion code.

Hi @Uttaran-IITH, please see this where with the guidance of @hgaiser @ikerodl96 I found the way around this issue.

uttaransinha commented 5 years ago

@mariaculman18, Thank you for the suggestion. Unfortunately, the solution works only for Densenet but not for Resnet101 or Resnet152. In the case of Densenet121, the mAP is very low even on the training data. I assume that the model you have used in the code is the inference model.

python3 ./keras-retinanet/keras_retinanet/bin/evaluate.py --backbone=resnet101 csv train.csv class.csv retinanet_resnet101_inference.h5

mariaculman18 commented 5 years ago

@Uttaran-IITH yes, I only tried the solution for DenseNet-121. I can no give you any suggestion for other backbones, sorry :(

In my case, I got a mAP of 96% with ResNet-50 and 90% with DenseNet-121, with the training data.

Jorbo19 commented 5 years ago

Hi, I use resnet101, but the loss is always around 1. How can I reduce it?

Jorbo19 commented 5 years ago

I only have one class

beibeiZ commented 5 years ago

when I train normally I get 0 mAP as shown below. Can anyone help me on this?

10000/10000 [==============================] - 3158s 316ms/step - loss: 5.4151 - regression_loss: 2.5757 - classification_loss: 2.8393 - val_loss: 5.3902 - val_regression_loss: 2.5642 - val_classification_loss: 2.8260 Running network: 100% (12704 of 12704) |#################################################################| Elapsed Time: 0:16:48 Time: 0:16:48 Parsing annotations: 100% (12704 of 12704) |#############################################################| Elapsed Time: 0:00:00 Time: 0:00:00 ('6066 instances of class', 'M', 'with average precision: 0.0000') ('8803 instances of class', 'W', 'with average precision: 0.0000') mAP: 0.0000

retinanet-evaluate --convert-model ./model/resnet50_csv_100.h5 csv ./train.csv ./class.csv Using TensorFlow backend. usage: retinanet-evaluate [-h] [--convert-model] [--backbone BACKBONE] [--gpu GPU] [--score-threshold SCORE_THRESHOLD] [--iou-threshold IOU_THRESHOLD] [--max-detections MAX_DETECTIONS] [--save-path SAVE_PATH] [--image-min-side IMAGE_MIN_SIDE] [--image-max-side IMAGE_MAX_SIDE] [--config CONFIG] {coco,pascal,csv} ... model retinanet-evaluate: error: argument dataset_type: invalid choice: './model/resnet50_csv_100.h5' (choose from 'coco', 'pascal', 'csv') why error？can you help me?

ihunhh commented 5 years ago

I guess it is caused by this:

yx-wu commented 4 years ago

@MAGI003769 hello,I met the same problem with you.With 'densenet201' as backbone,I got strange detection results and all the scores are 1.Has your problem been solved?Thank a lot.

hgaiser commented 4 years ago

Has your problem been solved?

I would also like to know if this problem still persists.

yx-wu commented 4 years ago

font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}

    I have solved this problem.

                            wuyuxin

                                wuyuxin@tju.edu.cn

    签名由
    网易邮箱大师
    定制

On 11/4/2019 18:52，Hans Gaiser<notifications@github.com> wrote：

Has your problem been solved?

I would also like to know if this problem still persists.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

bhavesh907 commented 4 years ago

I'm also facing a very strange problem. I get non-zero mAP value when evaluating during training but when I use convert_model.py and evaluate.py, I get zero mAP values. I'm facing this issue with the efficientnet backbones.

mngata commented 4 years ago

I also encounter the same problem. I change the backbone to detnet59, basicly an modificatition to resnet50. I can see the loss decrease while i am training, however, the each-epoch evaluation on the test set is always 0. The resnet50 backbone works well. I was wondering if there is error in the evaluation function.

foghegehog commented 4 years ago

@mariaculman18

Hi @Uttaran-IITH, please see this where with the guidance of @ hgaiser @ ikerodl96 I found the way around this issue.

Your solution helped me as well. Why not to make pr? (And use **common_args in all generators)

mariaculman18 commented 4 years ago

@mariaculman18

Hi @Uttaran-IITH, please see this where with the guidance of @ hgaiser @ ikerodl96 I found the way around this issue.

Your solution helped me as well. Why not to make pr? (And use **common_args in all generators)

Happy to know it worked for you :)

I guess contributors are aware of the problem. It is better if they make pr, I don't know how to do that :/

foghegehog commented 4 years ago

@mariaculman18

Happy to know it worked for you :)

I guess contributors are aware of the problem. It is better if they make pr, I don't know how to do that :/

Well, I dared to create the pr: https://github.com/fizyr/keras-retinanet/pull/1290. Let's see if it suits.

amindehnavi commented 4 years ago

when I train normally I get 0 mAP as shown below. Can anyone help me on this?

10000/10000 [==============================] - 3158s 316ms/step - loss: 5.4151 - regression_loss: 2.5757 - classification_loss: 2.8393 - val_loss: 5.3902 - val_regression_loss: 2.5642 - val_classification_loss: 2.8260 Running network: 100% (12704 of 12704) |#################################################################| Elapsed Time: 0:16:48 Time: 0:16:48 Parsing annotations: 100% (12704 of 12704) |#############################################################| Elapsed Time: 0:00:00 Time: 0:00:00 ('6066 instances of class', 'M', 'with average precision: 0.0000') ('8803 instances of class', 'W', 'with average precision: 0.0000') mAP: 0.0000

I have the same problem.have you found the solution?

mariaculman18 commented 4 years ago

when I train normally I get 0 mAP as shown below. Can anyone help me on this? 10000/10000 [==============================] - 3158s 316ms/step - loss: 5.4151 - regression_loss: 2.5757 - classification_loss: 2.8393 - val_loss: 5.3902 - val_regression_loss: 2.5642 - val_classification_loss: 2.8260 Running network: 100% (12704 of 12704) |#################################################################| Elapsed Time: 0:16:48 Time: 0:16:48 Parsing annotations: 100% (12704 of 12704) |#############################################################| Elapsed Time: 0:00:00 Time: 0:00:00 ('6066 instances of class', 'M', 'with average precision: 0.0000') ('8803 instances of class', 'W', 'with average precision: 0.0000') mAP: 0.0000

I have the same problem.have you found the solution?

Use the solution here for Densenet.

pleaseRedo commented 4 years ago

I also encounter the same problem. I change the backbone to detnet59, basicly an modificatition to resnet50. I can see the loss decrease while i am training, however, the each-epoch evaluation on the test set is always 0. The resnet50 backbone works well. I was wondering if there is error in the evaluation function.

@mngata Have you managed to fix this mAP 0.0 issue? I changed backbone to detnet59 and experienced same issue as well.

amindehnavi commented 4 years ago

when I train normally I get 0 mAP as shown below. Can anyone help me on this? 10000/10000 [==============================] - 3158s 316ms/step - loss: 5.4151 - regression_loss: 2.5757 - classification_loss: 2.8393 - val_loss: 5.3902 - val_regression_loss: 2.5642 - val_classification_loss: 2.8260 Running network: 100% (12704 of 12704) |#################################################################| Elapsed Time: 0:16:48 Time: 0:16:48 Parsing annotations: 100% (12704 of 12704) |#############################################################| Elapsed Time: 0:00:00 Time: 0:00:00 ('6066 instances of class', 'M', 'with average precision: 0.0000') ('8803 instances of class', 'W', 'with average precision: 0.0000') mAP: 0.0000

I have the same problem.have you found the solution?

Use the solution here for Densenet.

Thanks,it worked

fizyr / keras-retinanet

Evaluation always produces mAP of 0.0 when using backbones other than Resnet50 #647