RetinaNet script: Validation data set error

MurielleMardenli200 commented 3 weeks ago

In the RetinaNet script (see current PR), an error is thrown in the evaluate method used on the validation set. It uses the data set of json_annotation_val.json generated by preprocess_data_coco in preprocessing.py.

This is the error gotten when running the train script Validation run stopped due to: A prediction has class=53, but the dataset only has 2 classes and predicted class id should be in [0, 1]

hermancollin commented 3 weeks ago

This stackoverflow thread might provide some insight. Could you tell me what is the value of the following attributes: thing_list and thing_dataset_id_to_contiguous_id? (I think they should be in the JSON file) The latter is being used by the evaluation method to find the number of classes (num_classes). This variable is in turn responsible for the error.

Ref: https://detectron2.readthedocs.io/en/latest/_modules/detectron2/evaluation/coco_evaluation.html

    def _eval_predictions(self, predictions, img_ids=None):
        """
        Evaluate predictions. Fill self._results with the metrics of the tasks.
        """
        self._logger.info("Preparing results for COCO format ...")
        coco_results = list(itertools.chain(*[x["instances"] for x in predictions]))
        tasks = self._tasks or self._tasks_from_predictions(coco_results)

        # unmap the category ids for COCO
        if hasattr(self._metadata, "thing_dataset_id_to_contiguous_id"):
            dataset_id_to_contiguous_id = self._metadata.thing_dataset_id_to_contiguous_id
            all_contiguous_ids = list(dataset_id_to_contiguous_id.values())
            num_classes = len(all_contiguous_ids)
            assert min(all_contiguous_ids) == 0 and max(all_contiguous_ids) == num_classes - 1

            reverse_id_mapping = {v: k for k, v in dataset_id_to_contiguous_id.items()}
            for result in coco_results:
                category_id = result["category_id"]
                assert category_id < num_classes, (
                    f"A prediction has class={category_id}, "
                    f"but the dataset only has {num_classes} classes and "
                    f"predicted class id should be in [0, {num_classes - 1}]."
                )

hermancollin commented 3 weeks ago

If the thing_list attribute is absent, the solution might just be to set it like so:

MetadataCatalog.get("COCO_VAL_ANNOTATION").set(thing_classes=["axon"])

MurielleMardenli200 commented 2 weeks ago

It looks like thing_classes already has ["myelin", "axon"] and thing_dataset_id_to_contiguous_id already has the id's: {1: 0, 2: 1} and I'm unable to set them to a different value due to the error of type: Attribute 'thing_dataset_id_to_contiguous_id' in the metadata of '../data-coco/annotations/json_annotation_val.json' cannot be set to a different value! {0: 0, 1: 1} != {1: 0, 2: 1}

According to this num_classes should have a value of 2, but it doesn't

hermancollin commented 2 weeks ago

unfortunately the only other mention of this issue was here https://github.com/facebookresearch/detectron2/issues/5103 and they didn't get an answer from devs

MurielleMardenli200 commented 6 days ago

Just updating this here to document it, this week I debugged this problem and found that class=53 actually represents the first element of the array in the pred_class attribute of my model output. The array is supposed to contain values of 0, for the only class (axon). This is abnormal since the pred_boxes attribute contains plausible values because in the validation visualization, the model is able to draw boxes around a good part of axons.

{'instances': 
  pred_boxes: Boxes(...)
  pred_classes: tensor([53, 59, 57, 55, 14, 53, 25, 58, 77, 47, 14, 77, 55, 56, 73, 62, 59, 59,
          77, 54, 47, 15, 72, 26, 45,  0, 47, 47, 14, 57, 14, 77, 47, 54, 14, 54])])

I've been looking into solving this this week in order to properly use the COCOEvaluator class for the validation and test set.

hermancollin commented 6 days ago

@MurielleMardenli200 if I understand correctly in your dict pred_boxes and pred_classes both contain the same number of elements, and pred_classes contains a random integer for every box, can you confirm this?

Why does the model predict these classes?

hermancollin commented 6 days ago

Also note that COCOEvaluator() takes an argument called max_dets_per_image which limits the number of detection per image, and is set by default to 100. Maybe this is why your results looked like they were missing axons?

hermancollin commented 6 days ago

Btw which preprocessing script are you using? Is it this one? https://github.com/axondeepseg/axon-detection/blob/main/src/preprocessing.py

because the code for COCO still considers both axon and myelin. If you fixed this on your side, could you make a PR to update the preprocessing script?

MurielleMardenli200 commented 6 days ago

@MurielleMardenli200 if I understand correctly in your dict pred_boxes and pred_classes both contain the same number of elements, and pred_classes contains a random integer for every box, can you confirm this?

Why does the model predict these classes?

1) Yes exactly, pred_classes and pred_boxes have the same sizes. I'm not sure why it predicts them this way. The randomness of the integers is consistent even if I change hyperparameters. I'm gonna debug more to find the error.

2) I was not using COCOEvaluator to visualize the predictions since it wasn't working because of this issue, but I'll try using it after I resolve it.

3) I made the preprocessing script fix in a separate branch (fix/preprocessing). Here is the Pull request. But I am using the same code in my branch (dev/retinaNet).

hermancollin commented 6 days ago

@MurielleMardenli200 in the meantime maybe you can "manually" set all classes to 1 and try COCOEvaluator with this?

axondeepseg / axon-detection

RetinaNet script: Validation data set error #14