Simple detection evaluator

dolevr commented 4 years ago

❓ Questions and Help

General questions about detectron2.

Thanks for all the great work! I have my own custom detection dataset(s) and a split to train/validation. I would like to run periodic evaluation during training.

I set:

cfg.DATASETS.TEST = ("car_parts/valid",)
cfg.TEST.EVAL_PERIOD = 2000

If I understand correctly I need to set MetadataCatalog.get(dataset_name).evaluator_type but not sure what to use as evaluator. I have my own get_json() method since my data is not in any usual format. Is there a 'Simple detection evaluator'?

ppwwyyxx commented 4 years ago

As https://detectron2.readthedocs.io/tutorials/datasets.html#metadata-for-datasets said, evaluator_type is used by builtin datasets and you should specify the evaluator to be used in your training script.

There is currently no "simple" evaluator. So if your dataset is not in a standard format, we currently cannot evaluate it. It would be very nice to have one.

dolevr commented 4 years ago

Thanks, will try to implement one if time permits.

botcs commented 4 years ago

There is currently no "simple" evaluator... It would be very nice to have one.

@ppwwyyxx Can you check out maskrcnn-benchmark:#1104 and maskrcnn-benchmark:#1096?

I would be happy to implement something similar here.

And yeah, thanks for the supercool repo :)

ppwwyyxx commented 4 years ago

https://github.com/facebookresearch/maskrcnn-benchmark/pull/1096 seems similar what we want to do here. You're welcome to contribute something alike here!

I think it's possible to make the existing COCOEvaluator support a new dataset in its __init__ directly: if the dataset is not originally in coco format, create the COCO object by getting the data from DatasetRegistry and converting the dataset dicts to COCO-format json. Then, add the json file path to the associated metadata so this conversion only happens once.

botcs commented 4 years ago

@ppwwyyxx, if I understood correctly the thing that is missing from a general "simple" evaluator is a DatasetRegistry 2 COCO-json converter, is it correct?

In the meantime I have started to modularize the mAP evaluation process that works with COCO-format json. My plan is to reproduce results from VOC, COCO and CityScapes with the same toolkit: you can find it here. So far the Precision and Recall curve computing is ready

As soon as I can validate the COCO scores with it, I'll make a PR

ppwwyyxx commented 4 years ago

If you have converted all results to COCO-json, why not just use cocoapi to evaluate it?

botcs commented 4 years ago

there are multiple reasons for having a generic set of utility functions for evaluating the same metric:

transparency: it is extremely hard to reverse engineer what happens in cocoapi when trying to understand the evaluation process. Although this retrospective summary and this medium post is super helpful in this matter, I think it should be beneficial to have a straightforward implementation of the metric that is easy to see through, modify and improve.
flexibility: currently the end goal of the majority of the SOTAs is to improve the mAP without acknowledging any trade-off from the viewpoint of different metrics. This may hide some shortcomings of the algorithms when applied to custom datasets, that mAP is unaware of. Allowing people to easily compute multiple metrics in one pass, without reorganizing the data structure for the 100th time could help setting a new trend.
canonical: it is extremely frustrating that mAP scores reported can refer to VOC, COCO, or CityScapes, many papers (even useful / popular ones) pass without explicitly marking the reference implementation for that metric. In the very rare case when the source is published with trained baselines, it can be still painful to format the I/O pairs for each metric to find out which one was used exactly. OK, one might ask the authors as well, but... meh.

gessha commented 4 years ago

Just wanted to mention that if you follow the Collab example in this repo and you implement your own get_balloon_dicts(), you have to cast your bbox array to Python ints instead of np.uint64 because there's a problem with json.dump's serialization of numpy ints(version 1.17.2)

What I did was generate the dict for my dataset and go through all the annotations and cast the bbox parameters to Python ints, and then I was able to dump the whole dictionary.

ppwwyyxx commented 4 years ago

Make sense. I think we should convert them to float before dumping the json.

ppwwyyxx commented 4 years ago

done by #175

bconsolvo-zvelo commented 4 years ago

@ppwwyyxx, if I understood correctly the thing that is missing from a general "simple" evaluator is a DatasetRegistry 2 COCO-json converter, is it correct?

In the meantime I have started to modularize the mAP evaluation process that works with COCO-format json. My plan is to reproduce results from VOC, COCO and CityScapes with the same toolkit: you can find it here. So far the Precision and Recall curve computing is ready

As soon as I can validate the COCO scores with it, I'll make a PR

I was hoping to use your evaluator to get a recall curve with my COCO dataset. Just noticed though that it says "AUC evaluation done, precision recall computation is wrong", so I was a bit hesitant to try to use it. Any updates here? Thanks!

bokelai1989 commented 3 years ago

@ppwwyyxx I am getting some good results from Detectron2 model for my instance segmentation as below shows; the AP for testing dataset is about 88%. Testing data is not seen by the training model, so the by running below command and I can get the summarized AP values. However, how can I obtain the AP for my training dataset? so I can see whether I overfit the model or not when I tune the model parameters. Is there a parameter in the trainer model to adjust to get the ouput when training the model, or can I just do similar inference on my training dataset as what I am doing to testing dataset to get the summarized AP ? Thanks!

evaluator = COCOEvaluator(test_instance, cfg, False, output_dir=cfg.OUTPUT_DIR) val_loader = build_detection_test_loader(cfg, test_instance) result = inference_on_dataset(trainer.model, val_loader, evaluator)

solarflarefx commented 2 years ago

@bokelai1989 Did you ever figure this one out?

facebookresearch / detectron2

Simple detection evaluator #99

❓ Questions and Help