Closed weicheng113 closed 1 year ago
@HaoZhang534 Thanks a lot. I think I understand now.
By the way, I applied DINO to a custom dataset with only 2 classes. I transformed annotations into coco json format. But the metrics did not work. It printed something like below. The model could learn as the loss was going down as expected. I also tested the trained model and it could predict not bad. Could you offer some advice on this? Thanks.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Hi @weicheng113, can you provide more details like why you say "I also tested the trained model and it could predict not bad." given the metrics did not work?
@HaoZhang534 Thanks for your time and help.
I meant the loss was decreasing as expected(the beginning loss was around 60 to 70). Take below loss output for example.
loss=7.69, lr=0.0001, loss_class=0.00258, loss_bbox=0.0155, loss_giou=0.222...]
And I loaded the trained model and tested on images, it gave fairly good predictions on many images. Therefore, I think the model was learning with the custom dataset.
I am not sure where I mis-configured the evaluator part, as it depends on annotation file in coco format self._coco_api =COCO(json_file)
.
By the way, I tried training model on coco-minitrain, it gave correct metrics. I have not looked into the evaluator code, as it used multiple pycocotools classes, COCOEvaluator, COCOeval and etc.. Below is an example metrics outputed when I tried on coco-minitrain data.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.496
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.683
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.541
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.308
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.530
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.677
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.385
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.630
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.708
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.510
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.755
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.886
Hi @weicheng113, I suggest you visualize the predictions on the validation set to see wether the problem is on the model or the evaluator.
Hi @HaoZhang534 , thanks a lot. I got it working by using MeanAveragePrecision from torchmetrics, which has simpler interface. I will continue with training. When I get more metrics information, I will ask for your advice with finetuning.
@HaoZhang534 may I ask for your suggestion.
I have a dataset about 1,100 examples, 20% for validation and 80% for training. The dataset only has two categories, 0 and 1.
Below is the histogram of number of instances in each image file. Most of them has two objects in an image and maximum 4 objects in an image.
Category 1 has much less instances than category 0 as shown below, which I think it is ok for focal loss.
=========
I tried with the following configuration.
num_queries: 40
num_dn_queries: 10(if there is 2 instances in an image, it will have 5 DN groups - 5* [2 * 2])
num_select: 20 ( select top 20 confident prediction for validation)
num_epoch: 50 ( with StepLR(optimizer=optimizer, step_size=20) )
train_batch_size: 2
gradient_accumulation_steps: 1 (only on single gpu machine. doing gradient decent at every step).
After 50 epochs, I got below, which is not as good as our YOLOv5 version:
Epoch 49: 100%|██████████| 912/912 [07:43<00:00, 1.97it/s, loss=5.56, v_num=46, lr=1e-6, loss_class=0.013, loss_bbox=0.0112, loss_giou=0.142, val_loss=2.130]
{'map': tensor(0.7773),
'map_50': tensor(0.9670),
'map_75': tensor(0.9138),
'map_large': tensor(0.8001),
'map_medium': tensor(0.4979),
'map_per_class': tensor(-1.),
'map_small': tensor(-1.),
'mar_1': tensor(0.5967),
'mar_10': tensor(0.8588),
'mar_100': tensor(0.8595),
'mar_100_per_class': tensor(-1.),
'mar_large': tensor(0.8821),
'mar_medium': tensor(0.5853),
'mar_small': tensor(-1.)}
======= After looking the above instance stats, I am thinking having experiments with the following two configuratiion:
num_queries: 60
num_dn_queries: 16(if there is 2 instances in an image, it will have 8 DN groups - 8* [2 * 2])
num_select: 6 ( instead of using 4, leaving a bit room)
num_epoch: 50 ( with StepLR(optimizer=optimizer, step_size=20) )
train_batch_size: 2
gradient_accumulation_steps: 1 (only on single gpu machine. doing gradient decent at every step).
num_queries: 60
num_dn_queries: 16(if there is 2 instances in an image, it will have 8 DN groups - 8* [2 * 2])
num_select: 6 ( instead of using 4, leaving a bit room)
num_epoch: 50 ( with StepLR(optimizer=optimizer, step_size=20) )
train_batch_size: 2
gradient_accumulation_steps: 3 (only on single gpu machine. doing gradient decent at every 3 steps).
Any advice is highly appreciated.
Thanks, Cheng
Is the mAP 0.777% or 77.7%? Given that you have at most 4 instances per image. You can use small num_queries such as 40. num_select can be set larger such as 40. You can also load our coco pre-trained model to fine-tune the model on your dataset.
@HaoZhang534 Thanks a lot. mAP is 77.7%.
Could you explain why to use larger num_select? I thought num_select should be close to max number of instances?
Good suggestion. I will try with pre-trained model out.
I found that the model did not do well on category 1(which has less instances). It also has overlapping predictions, two similar bounding boxes predicted(which is a bit surprise, as CDN is created to eliminate similar predictions).
Thanks, Cheng
@weicheng113 Larger num_select usually leads to higher AP. About category 1, you may try some tricks to balance the proportions of the two categories. For example, you can add some copies of images from category 1 in the training data.
@HaoZhang534 Thanks for the explanation. For final inference, is there a rule to pick the confidence_threshold? I saw 0.5 as default in detrex and 0.3 in DINO repo.
@weicheng113 Which confidence_threshold do you mean? We do not use confidence_threshold to evaluate. We use confidence_threshold=0.3 for visualization.
Got you, thanks a lot @HaoZhang534 .
@HaoZhang534 I realized I can't directly load a pretrained dino model, as there is an incompatibility in num_classes, num_queries and num_dn_queries. I can load other weights like transformer weights, but something like class_embed, label_enc need to be retrained.
I will have a try to see if loading a pretrained dino model is helpful. Thanks.
@HaoZhang534 Thanks a lot. Just let you know I have tried with a pretrained DINO model, dino_swin_tiny_224_22kto1k_finetune_4scale_12ep.pth. It was much quicker to train for the first few epoches. And there is a slight improvement in the final mAP and it was with num_select=40.
I made a mistake when testing the model before. I did not load the correct trained model. The performance of previous model was quite good as well as the newly-trained model with the pretrained DINO model.
Epoch 49: 100%|██████████| 912/912 [07:48<00:00, 1.95it/s, loss=2.67, lr=1e-6, loss_class=0.00316, loss_bbox=0.00502, loss_giou=0.0413, val_loss=1.710]
{'map': tensor(0.7878),
'map_50': tensor(0.9703),
'map_75': tensor(0.9312),
'map_large': tensor(0.8038),
'map_medium': tensor(0.5119),
'map_per_class': tensor([0.8581, 0.7176]),
'map_small': tensor(-1.),
'mar_1': tensor(0.5966),
'mar_10': tensor(0.8496),
'mar_100': tensor(0.8532),
'mar_100_per_class': tensor([0.9112, 0.7952]),
'mar_large': tensor(0.8685),
'mar_medium': tensor(0.5519),
'mar_small': tensor(-1.)}
By the way, I rewrote and refactored ContrastiveDeNoising part and moved it to DataCollatorForTraining. I feel most CDN work can be done in data loader workers to improve gpu usage. I got one concern about CDN. When generating a negative noisy bounding box, it could be a valid positive bounding box for another ground truth bounding box, although the chance can be rare.
Thanks, Cheng
@weicheng113 You are welcome. Your concern is reasonable. It's really a problem when objects are crowded. Maybe some improvements can be made to fix this such as only use negative examples when objects are not crowded.
Excuse me, can you tell me how to load the custom dataset?
Would be very grateful if it could be done.
Hi @HaoZhang534 , thanks a lot. I got it working by using MeanAveragePrecision from torchmetrics, which has simpler interface. I will continue with training. When I get more metrics information, I will ask for your advice with finetuning.
halou i got the same question,can you give more detail about how to fix this bug?
Dear Authors, thanks for sharing high performance models.
I am reading through the DINO model code and get some questions below. Could you please help me?
The reason I am asking is because background class should also need a label, as I can see in DINO repo.
https://github.com/IDEA-Research/DINO/blob/66d7173cc4167934381a898b07c08507bdd96b63/models/dino/dino.py#L81
self.label_enc = nn.Embedding(dn_labelbook_size + 1, hidden_dim)
https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/detrex/modeling/criterion/criterion.py#L122
If num_classes already includes background class, then +1 in this line is not needed(but cross-entropy loss is not in use, so it does not matter.)? https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/detrex/modeling/criterion/criterion.py#L103
I was trying to apply DINO model in my custom dataset. So far it can train, but the performance is not so good. I think I might misunderstand
num_classes
.=======UPDATE======
https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/detrex/modeling/criterion/criterion.py#L141
I went through the code second time. It looks like for Focal Loss, num_classes only needs to = actual_num_classes(without +1 for background class). For example, there is a dataset with 2 classes: 0, 1. The logits for each prediction only needs 2 numbers, e.g., [0.0145, 0.0111]. If it is mapped to background class '2', onehot encoding will be [0, 0, 1]. We can cut off the last digit of onehot so that [0.0145, 0.0111] is comparing with [0, 0].
So with Focal Loss, we only need to set num_classes=actual_num_classes(without +1 for background class) for all, including the following two locations.
https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/projects/dino/modeling/dino.py#L92
https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/projects/dino/modeling/dino.py#L101
Background class concept is only limited inside SetCriterion class when trying to produce one-hot encoding and the last digit will be cut off, becoming all zeros for background class.
https://github.com/IDEA-Research/detrex/blob/697f5e9dafab6ea1769ec2ea1e0b65351273aa32/detrex/modeling/criterion/criterion.py#L116
Is my understanding correct?
Thanks, Cheng