MaskDino Fails to learn Precise Bounding Boxes on custom dataset but Dino does

FabianSchuetze commented 1 year ago

Thanks for the wonderful repo. It's a pleasure to work with it and to read the code.

When training MaskDino on a custom dataset, the bounding box predictions are not very good. Interestingly:

Dino learns good bounding boxes on the same dataset
The instance masks predicted by Mask Dino are good too
Bounding boxes with 50mAP are learned with MaskDino, but the bounding boxes for mAP 75 (and better) are poor.
MaskDino errored at the end with a problem in the cost_matrix. See the logs for details.

Does anybody have an idea what I could tune to generate good bb results?

Training Details: I have slightly modified the training process (see this branch https://github.com/FabianSchuetze/detrex/tree/my_changes). I added amp training and have included some gradient checkpointing. I train with one GPU and a batch size of four (for MaskDino, Dino works with a batch size of 8). The learning rate is lowered linearly.

Data: The instances are very dense, similar to the "is-crowded" scenes of COCO. There is only one class. I have adjusted the num_objects in the config files.

Logs: Logs of the training runs are attached below. There are three logs:

One for MaskDino with the original noise scale (0.4)
One with a noise scale of 1.0 (Dino uses this value)
One for Dino (noise scale of 1.0)

Hyparameters: Comparing the parameters, the following aspects seem notable:

Maskidio has a higher Hungarian class loss (5 vs 2)
Maskdino has 1/3 of the queries (300 vs 900)

maskdino_0.4_noise_scale.txt maskdino_1.0_noise_scale.txt dino_log.txt

Does anybody have an idea how to debug the problem?

FabianSchuetze commented 1 year ago

To reproduce the results, I have used a public dataset with similar characteristics. In The COB-3D dataset, see: https://arxiv.org/abs/2210.07424 . I have extracted rgb images, bounding boxes, instance mask in the coco format. The dataset is a bit small (~6k images) and can be downloaded here. The original data is here. Please note that the data is published under the CC, non-comercial see https://github.com/wyndwarrior/autoregressive-bbox/blob/main/LICENSE .

An image of the predictions with maskdino and the gt are:

The logs for dino and mask dino are uploaded below. dino.txt maskDino.txt

Interestingly:

The bb mAP for dino is much better than for maskDino. The training is a bit short, but I noticed similar difference after longer training
However, when looking at the predictions, the visualized bbs for dino at not that much better. Both show a little bit of a low recall. I also uploaded the json predictions.
The results with standard Mask-RCNN heads are generally pretty good on this dataset. They have a good recall and good precision.

HaoZhang534 commented 1 year ago

To reproduce the results, I have used a public dataset with similar characteristics. In The COB-3D dataset, see: https://arxiv.org/abs/2210.07424 . I have extracted rgb images, bounding boxes, instance mask in the coco format. The dataset is a bit small (~6k images) and can be downloaded here. The original data is here. Please note that the data is published under the CC, non-comercial see https://github.com/wyndwarrior/autoregressive-bbox/blob/main/LICENSE .

An image of the predictions with maskdino and the gt are:

The logs for dino and mask dino are uploaded below. dino.txt maskDino.txt

Interestingly:

The bb mAP for dino is much better than for maskDino. The training is a bit short, but I noticed similar difference after longer training

However, when looking at the predictions, the visualized bbs for dino at not that much better. Both show a little bit of a low recall. I also uploaded the json predictions.

The results with standard Mask-RCNN heads are generally pretty good on this dataset. They have a good recall and good precision.

Hello, I notice that the boxes by maskdino are all shifted upper right a little bit. I guess there may be some bugs in the postprocessing code.

HaoZhang534 commented 1 year ago

@FabianSchuetze When you have relatively small datasets, Mask-RCNN usually can do good enough. MaskDINO and DINO are suitable for relatively large datasets such as COCO.

HaoZhang534 commented 1 year ago

@FabianSchuetze We fixed a bug in #249. Maybe you can run again to see if this solved your problem. Please also refer to the discussions in #247 .

FabianSchuetze commented 1 year ago

Thank you so much, @HaoZhang534 ! I will train the model again tomorrow and report back.

FabianSchuetze commented 1 year ago

@HaoZhang534 . I have worked wit the new commits but the bounding boxes are still shifted. I have commented again in #247 .

Furthermore, I am still not getting very good results. Maybe the training process is not really possible with just a batch size of 4? I will try to train on MS CoCo and see whether I can reproduce the original results. Can you maybe attach a log of the original training process? That would be wonderful & would make a comparison easier.

IDEA-Research / detrex

MaskDino Fails to learn Precise Bounding Boxes on custom dataset but Dino does #242