Nan values for certain classes during testing

ashnair1 commented 5 years ago

I tried out the Mask RCNN framework and observed pretty good results. But when I try to calculate the mAP score via the test_net.py script, I get a lot of nans. Would anyone happen to know why this is the case? The detections that the models produce on the test images looks pretty good hence my confusion regarding this error.

INFO json_dataset_evaluator.py: 222: ~~~~ Mean and per-category AP @ IoU=[0.50,0.95] ~~~~
INFO json_dataset_evaluator.py: 223: 12.0
INFO json_dataset_evaluator.py: 231: 44.6
/usr/local/onnx/numpy/core/fromnumeric.py:2957: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/local/onnx/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 64.1
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 36.2
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 6.0
INFO json_dataset_evaluator.py: 231: 1.4
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 19.8
INFO json_dataset_evaluator.py: 231: 3.2
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 5.9
INFO json_dataset_evaluator.py: 231: 4.6
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 5.1
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 28.6
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: nan
INFO json_dataset_evaluator.py: 231: 32.0
....................................................................

INFO json_dataset_evaluator.py: 232: ~~~~ Summary metrics ~~~~
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.120
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.180
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.129
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.068
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.232
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.207
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.059
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.135
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.166
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.318
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.246

To verify whether it was an issue with the code, I tried working with another dataset of mine where there was only one class excluding background. The evaluation works fine there as can be seen below:

INFO json_dataset_evaluator.py: 222: ~~~~ Mean and per-category AP @ IoU=[0.50,0.95] ~~~~
INFO json_dataset_evaluator.py: 223: 38.6
INFO json_dataset_evaluator.py: 231: 38.6
INFO json_dataset_evaluator.py: 232: ~~~~ Summary metrics ~~~~
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.386
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.647
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.420
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.607
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.761
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.011
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.092
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.428
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.381
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.652
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.799

The issue is I can't seem to understand the cause of the problem. If the problem was due to the dataset, it would have shown in the detection results on the test images. Training and Testing the same code with another dataset indicates the problem may not be due to the code. I would appreciate any insight into this problem.

System information

Operating system: Ubuntu 16.04
CUDA version: 10.0
python version: 3.6.5
pytorch version: 0.4.0
numpy version: 1.14.0

ApoorvaSuresh commented 5 years ago

Same issue here. :( Any updates?

ashnair1 commented 5 years ago

My problem was due to the complete absence of certain classes in my validation set. This probably caused a division by zero somewhere (maybe while calculating precision and recall where both TP and FP were zero) resulting in nan for those classes. When I removed those specific classes that were absent in my validation set, it worked as expected.

I'd suggest checking the class distribution in the training and validation sets just to rule out this possibility.

ApoorvaSuresh commented 5 years ago

Ah, alright! thank you so muh.

I have another issue and I'm very new to this- I train the weights and when i use them to find the output, Everytime I use the same input image, I get different outputs for different executions (input image and weights are the same, so i expect the same output for every execution). Do you have any idea why this could happen and which part I should try debugging?

ashnair1 commented 5 years ago

Your issue might most likely be due to the random initialisation of some layers during each inference run. This could be due to improper loading of weights. I had previously encountered such an issue with a different model here. You will need to check whether the layer weights are loaded correctly and are constant across two inference runs.

ApoorvaSuresh commented 5 years ago

Thank you, Will check that (I'm implementing Matterport's Mask RCNN actually)

facebookresearch / Detectron

Nan values for certain classes during testing #854

System information