PathologyDataScience / NuCLS

NuCLS: A scalable crowdsourcing, deep learning approach and dataset for nucleus classification, localization and segmentation
MIT License
46 stars 13 forks source link

How to interpret the test set evaluation metric values of NuCLS model #4

Closed abdul2706 closed 3 years ago

abdul2706 commented 3 years ago

Hi, I am a Computer Science student working on a research project related to medical images. I found your project very interesting, so I tried running your code on Google Colab. I am able to train the MaskRCNN on the fold_1 successfully, but I am unable to interpret the evaluation metric values that are generated for the test dataset during training.

In the image attached, I have highlighted two sets of columns. I wanted to ask the following questions:

  1. How to interpret mAP values (written in columns E, F, G) and objectness mAP values (written in columns I, J, K)?
  2. Is there supposed to be a very large difference between both types of mAP values?
  3. Which type of mAP values are reported in the research paper published?

evaluation-results-01

kheffah commented 3 years ago

Hi @abdul2706, Thank you for your interest in our work and for trying out the code. The AP and mAP values that we report in the paper and that you should be using is the objectness AP (highlighted in green in your spreadsheet) -- This measures detection alone, regardless of the nucleus class or segmentation. We argue in our preprint that detection, classification and segmentation are tasks with disparate clinical utility, we recommend reporting detection, classification and segmentation independently (as we do in the paper). Also note that because our truth contains a mixture of segmentations and bounding boxes, this is the only proper way to report accuracy. I can see that your model has reached a detection accuracy of 74.4%, which is pretty close to the number we got for this fold (75.3).

Just for your information, the columns highlighted in yellow show a combined metric that is used by classic datasets like COCO, where an object is only considered to be detected if the nucleus segmentation significantly overlaps with a true nucleus segmentation of the same class. I'd recommend ignoring this metric for the reasons I described earlier.

Hope this answers your question. Cheers!

abdul2706 commented 3 years ago

@kheffah, Thanks very much for clarifying everything. 👍

kheffah commented 3 years ago

You're more than welcome :)

abdul2706 commented 3 years ago

Hi @kheffah,

I wanted to evaluate my custom model using objectness mAP criteria rather than default coco mAP criteria. I tried to figure out the objectness evaluation criteria working, from the code in this repo, but it seems to be divided into multiple files, which is difficult for me to use in my implementation.

I wanted to request that if you could help me with either of the following, then I will be very thankful to you.

  1. Please provide the objectness evaluation criteria as a single function (or file).
  2. or, please guide me to compute it from the default coco evaluator.
  3. or, kindly provide an algorithm to calculate it as an independent evaluation function.

Thanking you in advance.