Issue with evaluation when using a custom dataset

roboyul commented 9 months ago

Hi there, thank you so much for putting this repository together for this implementation, it's very interesting!

I'm working on implementing this with a custom COCO instances formatted dataset rather than the original COCO 2017 instances dataset. I did an initial test run using the original COCO dataset, and was able to see the validation segm AP results gradually begin to increase as expected in as little as 500 iterations with a batch size of 2 for a quick test:

python3 -W ignore train_net.py --config-file ./configs/coco/instance-segmentation/deit/maskformer2_deit_base_bs16_50ep.yaml --num-gpus 2 --num-machines 1 SSL.PERCENTAGE 100 SSL.TRAIN_SSL False OUTPUT_DIR ./output-teacher

My problems arise when I begin integrating my custom dataset. I am able to successfully register my training/test set using register_coco_instances from data.datasets > coco.py. I then update the configuration accordingly:

cfg.DATASETS.TRAIN = ("custom_train",)
cfg.DATASETS.TEST = ("custom_test",)

Inside the coco_unlabel folder, I create the symlinks for the images folder pointing to my training images folder and the symlink for the val2017 folder to my validation set as per the instructions. I point DETECTRON2_DATASETS to the location where coco_unlabel lives, and it appears to pick it up.

Up to here, everything works fine. The training job starts using:

python3 -W ignore train_net.py --config-file ./configs/coco/instance-segmentation/deit/maskformer2_deit_base_bs16_50ep.yaml --num-gpus 2 --num-machines 1 SSL.PERCENTAGE 100 SSL.TRAIN_SSL False OUTPUT_DIR ./output-teacher

When the training job attempts to do the first evaluation step (set to 500 for testing), an error shows explaining my test set doesn't appear to be registeredm even though it picked up the training set:

[03/02 22:16:41 d2.utils.events]:  eta: 2 days, 13:54:06  iter: 499  total_loss: 50.87  loss_ce: 0.1988  loss_mask: 1.255  loss_dice: 3.667  loss_ce_0: 1.005  loss_mask_0: 0.8838  loss_dice_0: 3.57  loss_ce_1: 0.1726  loss_mask_1: 1.169  loss_dice_1: 3.563  loss_ce_2: 0.1709  loss_mask_2: 1.215  loss_dice_2: 3.544  loss_ce_3: 0.1839  loss_mask_3: 1.165  loss_dice_3: 3.657  loss_ce_4: 0.1798  loss_mask_4: 1.212  loss_dice_4: 3.613  loss_ce_5: 0.2062  loss_mask_5: 1.233  loss_dice_5: 3.729  loss_ce_6: 0.2123  loss_mask_6: 1.267  loss_dice_6: 3.744  loss_ce_7: 0.2188  loss_mask_7: 1.259  loss_dice_7: 3.683  loss_ce_8: 0.1927  loss_mask_8: 1.263  loss_dice_8: 3.703    time: 0.6120  last_time: 0.6109  data_time: 0.0064  last_data_time: 0.0057   lr: 0.0001  max_mem: 10689M
Traceback (most recent call last):
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/catalog.py", line 51, in get
    f = self[name]
  File "/usr/lib/python3.10/collections/__init__.py", line 1106, in __getitem__
    raise KeyError(key)
KeyError: 'custom_test'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/b/GuidedDistillation/train_net.py", line 470, in <module>
    launch(
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/engine/launch.py", line 84, in launch
    main_func(*args)
  File "/home/b/GuidedDistillation/train_net.py", line 464, in main
    return trainer.train()
  File "/home/b/GuidedDistillation/modules/defaults.py", line 566, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/b/GuidedDistillation/modules/train_loop.py", line 165, in train
    self.after_step()
  File "/home/b/GuidedDistillation/modules/train_loop.py", line 199, in after_step
    h.after_step()
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/engine/hooks.py", line 556, in after_step
    self._do_eval()
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/engine/hooks.py", line 529, in _do_eval
    results = self._func()
  File "/home/b/GuidedDistillation/modules/defaults.py", line 525, in test_and_save_results
    self._last_eval_results = self.test(self.cfg, self.model)
  File "/home/b/GuidedDistillation/modules/defaults.py", line 691, in test
    evaluator = cls.build_evaluator(cfg, dataset_name)
  File "/home/b/GuidedDistillation/train_net.py", line 115, in build_evaluator
    evaluator_list.append(COCOEvaluator(dataset_name, output_dir=output_folder))
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/evaluation/coco_evaluation.py", line 142, in __init__
    convert_to_coco_json(dataset_name, cache_path, allow_cached=allow_cached_coco)
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/datasets/coco.py", line 511, in convert_to_coco_json
    coco_dict = convert_to_coco_dict(dataset_name)
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/datasets/coco.py", line 354, in convert_to_coco_dict
    dataset_dicts = DatasetCatalog.get(dataset_name)
  File "/home/b/.local/lib/python3.10/site-packages/detectron2/data/catalog.py", line 53, in get
    raise KeyError(
KeyError: "Dataset 'custom_test' is not registered!

If I register the test set with detectron2.data.datasets instead of data.datasets, the evaluation works, but the AP values are always 0 no matter how long the job runs:

[03/02 22:23:40 d2.utils.events]:  eta: 2 days, 11:02:34  iter: 479  total_loss: 51.48  loss_ce: 0.2197  loss_mask: 0.9924  loss_dice: 3.769  loss_ce_0: 1.136  loss_mask_0: 0.8419  loss_dice_0: 3.583  loss_ce_1: 0.2115  loss_mask_1: 1.055  loss_dice_1: 3.583  loss_ce_2: 0.1991  loss_mask_2: 1.087  loss_dice_2: 3.628  loss_ce_3: 0.2439  loss_mask_3: 1.014  loss_dice_3: 3.63  loss_ce_4: 0.2733  loss_mask_4: 0.9731  loss_dice_4: 3.611  loss_ce_5: 0.2954  loss_mask_5: 0.9499  loss_dice_5: 3.639  loss_ce_6: 0.2749  loss_mask_6: 1.042  loss_dice_6: 3.646  loss_ce_7: 0.2482  loss_mask_7: 0.9416  loss_dice_7: 3.682  loss_ce_8: 0.2521  loss_mask_8: 1.016  loss_dice_8: 3.729    time: 0.5927  last_time: 0.5788  data_time: 0.0061  last_data_time: 0.0044   lr: 0.0001  max_mem: 10690M
[03/02 22:23:52 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[03/02 22:23:52 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[03/02 22:23:52 d2.data.common]: Serializing 74 elements to byte tensors and concatenating them all ...
[03/02 22:23:52 d2.data.common]: Serialized dataset takes 0.05 MiB
[03/02 22:23:52 d2.evaluation.evaluator]: Start inference on 74 batches
[03/02 22:23:54 d2.evaluation.evaluator]: Inference done 11/74. Dataloading: 0.0010 s/iter. Inference: 0.1043 s/iter. Eval: 0.0543 s/iter. Total: 0.1596 s/iter. ETA=0:00:10
[03/02 22:23:59 d2.evaluation.evaluator]: Inference done 44/74. Dataloading: 0.0010 s/iter. Inference: 0.1034 s/iter. Eval: 0.0521 s/iter. Total: 0.1565 s/iter. ETA=0:00:04
[03/02 22:24:04 d2.evaluation.evaluator]: Total inference time: 0:00:11.024099 (0.159770 s / iter per device, on 1 devices)
[03/02 22:24:04 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:07 (0.106315 s / iter per device, on 1 devices)
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Saving results to ./output-teacher/inference/coco_instances_results.json
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.00 seconds.
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.00 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000  | 0.000  | 0.000 | 0.000 | 0.000 |
Loading and preparing results...
DONE (t=0.06s)
creating index...
index created!
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Evaluate annotation type *segm*
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.01 seconds.
[03/02 22:24:04 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[03/02 22:24:04 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.00 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[03/02 22:24:04 d2.evaluation.coco_evaluation]: Evaluation results for segm: 
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000  | 0.000  | 0.000 | 0.000 | 0.000 |
[03/02 22:24:04 d2.evaluation.testing]: copypaste: Task: bbox
[03/02 22:24:04 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/02 22:24:04 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000
[03/02 22:24:04 d2.evaluation.testing]: copypaste: Task: segm
[03/02 22:24:04 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/02 22:24:04 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Am I missing something here? I'm assuming its related to registering my datasets, as the original COCO dataset implementation from the guide appears to work. I've also made sure to update the NUM_CLASSES field across the config according to the classes available in my custom dataset. I've also tried the Dino/R50 bases as well with no luck. Thank you!

TariqBerrada commented 8 months ago

Hello,

Some points I would try to check :

Are your annotation masks registered in the correct format ?
Can you recover the correct values from the annotation masks you created ?
Does your custom dataset have the correct metadata associated ?https://github.com/facebookresearch/GuidedDistillation/blob/b48f2a81578a763a93464daf3b1b1b69137edfae/data/datasets/coco.py#L484
Are you doing validation w.r.t your validation set or the coco validation set ? https://github.com/facebookresearch/detectron2/blob/main/detectron2/evaluation/coco_evaluation.py

Hope this helps !

roboyul commented 8 months ago

Thank you for the prompt reply @TariqBerrada! Looks like it was related to your last two points. After reviewing and ensuring my metadata is properly configured for my custom set, the validation metrics are as expected.

As a follow up, is there an inference script example available, or should I follow standard M2F/D2 inference for these models?

Ignoring custom datasets for now, I trained the Teacher model as per the README on the COCO dataset and grabbed a checkpoint that evaled at around 20 AP. I wanted to run inference just to visually see how the Teacher model is performing on its predictions before I continue with second step.

I'm using Mask2Former's demo.py to test inference and added the add_ssl_config(cfg) line to properly load the output config from the teacher model:

python3 demo.py --config-file ./output-teacher/config.yaml --input test.jpg --opts MODEL.WEIGHTS ./output-teacher/model_best.pth TEST.DETECTIONS_PER_IMAGE 5

But the results are not what I would expect even though the validation results appear promising. Almost like it's inferencing against untrained weights or something? I would expect a roughly accurate mask around some of the objects and perhaps miscategorized like the top half of illustrations/coco_illustration.png, but it's quite worse than that (nearly full-image and very noisy masks, always completely different preds). Here's an example from COCO's test2017:

1 detection for testing: https://github.com/facebookresearch/GuidedDistillation/assets/161990216/7c57684c-e709-4c60-8c21-a276baa06a58 https://github.com/facebookresearch/GuidedDistillation/assets/161990216/65cf7ff9-5787-4f3a-9345-e6af38f77693 100 detections: https://github.com/facebookresearch/GuidedDistillation/assets/161990216/6b75e163-790b-4618-b2d2-646d3aa6b9ce

Am I misunderstanding how to test this Teacher model, or am I simply just loading the model incorrectly? Is the Teacher model not designed to output the instance predictions for this use? I'm rather new to this teacher/student system so thank you for your time and patience!

HGCSDN commented 7 months ago

I met the same problem. I finally solved it by commenting lines 406-409 in the file "modules/defaults.py" and changing the content in the line 413 from checkp to model. You can have a try.

facebookresearch / GuidedDistillation

Issue with evaluation when using a custom dataset #6