chenhaoxing / DiffusionInst

This repo is the code of paper "DiffusionInst: Diffusion Model for Instance Segmentation" (ICASSP'24).
Apache License 2.0
222 stars 14 forks source link

I got all mAP results zeros #13

Closed selinkoles closed 1 year ago

selinkoles commented 1 year ago

I used following command to evaluate the diffinst.coco.res101 model with pretrained weights torchvision-R-101.pkl using following command:

python train_net.py --num-gpus 1 --config-file configs/diffinst.coco.res101.yaml --eval-only MODEL.WEIGHTS models/torchvision-R-101.pkl

The results I got are all zeros, as follows:

[03/31 23:56:44 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[03/31 23:56:44 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json
[03/31 23:57:03 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.79s)
creating index...
index created!
[03/31 23:57:05 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[03/31 23:57:19 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 14.15 seconds.
[03/31 23:57:19 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[03/31 23:57:24 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 4.71 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.001
[03/31 23:57:24 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000  | 0.000  | 0.000 | 0.000 | 0.000 |
[03/31 23:57:24 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP    | category     | AP    | category       | AP    |
|:--------------|:------|:-------------|:------|:---------------|:------|
| person        | 0.000 | bicycle      | 0.000 | car            | 0.000 |
| motorcycle    | 0.000 | airplane     | 0.000 | bus            | 0.000 |
| train         | 0.000 | truck        | 0.000 | boat           | 0.000 |
| traffic light | 0.000 | fire hydrant | 0.000 | stop sign      | 0.000 |
| parking meter | 0.000 | bench        | 0.000 | bird           | 0.000 |
| cat           | 0.000 | dog          | 0.000 | horse          | 0.000 |
| sheep         | 0.000 | cow          | 0.000 | elephant       | 0.000 |
| bear          | 0.000 | zebra        | 0.000 | giraffe        | 0.000 |
| backpack      | 0.000 | umbrella     | 0.000 | handbag        | 0.000 |
| tie           | 0.000 | suitcase     | 0.000 | frisbee        | 0.000 |
| skis          | 0.000 | snowboard    | 0.000 | sports ball    | 0.000 |
| kite          | 0.000 | baseball bat | 0.000 | baseball glove | 0.000 |
| skateboard    | 0.000 | surfboard    | 0.000 | tennis racket  | 0.000 |
| bottle        | 0.000 | wine glass   | 0.000 | cup            | 0.000 |
| fork          | 0.000 | knife        | 0.000 | spoon          | 0.000 |
| bowl          | 0.000 | banana       | 0.000 | apple          | 0.000 |
| sandwich      | 0.000 | orange       | 0.000 | broccoli       | 0.000 |
| carrot        | 0.000 | hot dog      | 0.000 | pizza          | 0.000 |
| donut         | 0.000 | cake         | 0.000 | chair          | 0.000 |
| couch         | 0.000 | potted plant | 0.000 | bed            | 0.000 |
| dining table  | 0.000 | toilet       | 0.000 | tv             | 0.000 |
| laptop        | 0.000 | mouse        | 0.000 | remote         | 0.000 |
| keyboard      | 0.000 | cell phone   | 0.000 | microwave      | 0.000 |
| oven          | 0.000 | toaster      | 0.000 | sink           | 0.000 |
| refrigerator  | 0.000 | book         | 0.000 | clock          | 0.000 |
| vase          | 0.000 | scissors     | 0.000 | teddy bear     | 0.000 |
| hair drier    | 0.000 | toothbrush   | 0.000 |                |       |
Loading and preparing results...
DONE (t=12.62s)
creating index...
index created!
[03/31 23:58:05 d2.evaluation.fast_eval_api]: Evaluate annotation type *segm*
[03/31 23:58:25 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 19.72 seconds.
[03/31 23:58:25 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[03/31 23:58:30 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 4.89 seconds.
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
[03/31 23:58:31 d2.evaluation.coco_evaluation]: Evaluation results for segm: 
|  AP   |  AP50  |  AP75  |  APs  |  APm  |  APl  |
|:-----:|:------:|:------:|:-----:|:-----:|:-----:|
| 0.000 | 0.000  | 0.000  | 0.000 | 0.000 | 0.000 |
[03/31 23:58:31 d2.evaluation.coco_evaluation]: Per-category segm AP: 
| category      | AP    | category     | AP    | category       | AP    |
|:--------------|:------|:-------------|:------|:---------------|:------|
| person        | 0.000 | bicycle      | 0.000 | car            | 0.000 |
| motorcycle    | 0.000 | airplane     | 0.000 | bus            | 0.000 |
| train         | 0.000 | truck        | 0.000 | boat           | 0.000 |
| traffic light | 0.000 | fire hydrant | 0.000 | stop sign      | 0.000 |
| parking meter | 0.000 | bench        | 0.000 | bird           | 0.000 |
| cat           | 0.000 | dog          | 0.000 | horse          | 0.000 |
| sheep         | 0.000 | cow          | 0.000 | elephant       | 0.000 |
| bear          | 0.000 | zebra        | 0.000 | giraffe        | 0.000 |
| backpack      | 0.000 | umbrella     | 0.000 | handbag        | 0.000 |
| tie           | 0.000 | suitcase     | 0.000 | frisbee        | 0.000 |
| skis          | 0.000 | snowboard    | 0.000 | sports ball    | 0.000 |
| kite          | 0.000 | baseball bat | 0.000 | baseball glove | 0.000 |
| skateboard    | 0.000 | surfboard    | 0.000 | tennis racket  | 0.000 |
| bottle        | 0.000 | wine glass   | 0.000 | cup            | 0.000 |
| fork          | 0.000 | knife        | 0.000 | spoon          | 0.000 |
| bowl          | 0.000 | banana       | 0.000 | apple          | 0.000 |
| sandwich      | 0.000 | orange       | 0.000 | broccoli       | 0.000 |
| carrot        | 0.000 | hot dog      | 0.000 | pizza          | 0.000 |
| donut         | 0.000 | cake         | 0.000 | chair          | 0.000 |
| couch         | 0.000 | potted plant | 0.000 | bed            | 0.000 |
| dining table  | 0.000 | toilet       | 0.000 | tv             | 0.000 |
| laptop        | 0.000 | mouse        | 0.000 | remote         | 0.000 |
| keyboard      | 0.000 | cell phone   | 0.000 | microwave      | 0.000 |
| oven          | 0.000 | toaster      | 0.000 | sink           | 0.000 |
| refrigerator  | 0.000 | book         | 0.000 | clock          | 0.000 |
| vase          | 0.000 | scissors     | 0.000 | teddy bear     | 0.000 |
| hair drier    | 0.000 | toothbrush   | 0.000 |                |       |
[03/31 23:58:34 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format:
[03/31 23:58:34 d2.evaluation.testing]: copypaste: Task: bbox
[03/31 23:58:34 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/31 23:58:34 d2.evaluation.testing]: copypaste: 0.0000,0.0001,0.0000,0.0000,0.0000,0.0000
[03/31 23:58:34 d2.evaluation.testing]: copypaste: Task: segm
[03/31 23:58:34 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/31 23:58:34 d2.evaluation.testing]: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Also, I got few warnings, while evaluating the above command:

WARNING [03/31 23:17:36 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
alphas_cumprod
alphas_cumprod_prev
backbone.fpn_lateral2.{bias, weight}
backbone.fpn_lateral3.{bias, weight}
backbone.fpn_lateral4.{bias, weight}
backbone.fpn_lateral5.{bias, weight}
backbone.fpn_output2.{bias, weight}
backbone.fpn_output3.{bias, weight}
backbone.fpn_output4.{bias, weight}
backbone.fpn_output5.{bias, weight}
betas
head.head_series.0.bboxes_delta.{bias, weight}
head.head_series.0.block_time_mlp.1.{bias, weight}
head.head_series.0.class_logits.{bias, weight}
head.head_series.0.cls_module.0.weight
head.head_series.0.cls_module.1.{bias, weight}
head.head_series.0.controller.{bias, weight}
head.head_series.0.inst_interact.dynamic_layer.{bias, weight}
head.head_series.0.inst_interact.norm1.{bias, weight}
head.head_series.0.inst_interact.norm2.{bias, weight}
head.head_series.0.inst_interact.norm3.{bias, weight}
head.head_series.0.inst_interact.out_layer.{bias, weight}
head.head_series.0.linear1.{bias, weight}
head.head_series.0.linear2.{bias, weight}
head.head_series.0.norm1.{bias, weight}
head.head_series.0.norm2.{bias, weight}
head.head_series.0.norm3.{bias, weight}
head.head_series.0.reg_module.0.weight
head.head_series.0.reg_module.1.{bias, weight}
head.head_series.0.reg_module.3.weight
head.head_series.0.reg_module.4.{bias, weight}
head.head_series.0.reg_module.6.weight
head.head_series.0.reg_module.7.{bias, weight}
head.head_series.0.self_attn.out_proj.{bias, weight}
head.head_series.0.self_attn.{in_proj_bias, in_proj_weight}
head.head_series.1.bboxes_delta.{bias, weight}
head.head_series.1.block_time_mlp.1.{bias, weight}
head.head_series.1.class_logits.{bias, weight}
head.head_series.1.cls_module.0.weight
head.head_series.1.cls_module.1.{bias, weight}
head.head_series.1.controller.{bias, weight}
head.head_series.1.inst_interact.dynamic_layer.{bias, weight}
head.head_series.1.inst_interact.norm1.{bias, weight}
head.head_series.1.inst_interact.norm2.{bias, weight}
head.head_series.1.inst_interact.norm3.{bias, weight}
head.head_series.1.inst_interact.out_layer.{bias, weight}
head.head_series.1.linear1.{bias, weight}
head.head_series.1.linear2.{bias, weight}
head.head_series.1.norm1.{bias, weight}
head.head_series.1.norm2.{bias, weight}
head.head_series.1.norm3.{bias, weight}
head.head_series.1.reg_module.0.weight
head.head_series.1.reg_module.1.{bias, weight}
head.head_series.1.reg_module.3.weight
head.head_series.1.reg_module.4.{bias, weight}
head.head_series.1.reg_module.6.weight
head.head_series.1.reg_module.7.{bias, weight}
head.head_series.1.self_attn.out_proj.{bias, weight}
head.head_series.1.self_attn.{in_proj_bias, in_proj_weight}
head.head_series.2.bboxes_delta.{bias, weight}
head.head_series.2.block_time_mlp.1.{bias, weight}
head.head_series.2.class_logits.{bias, weight}
head.head_series.2.cls_module.0.weight
head.head_series.2.cls_module.1.{bias, weight}
head.head_series.2.controller.{bias, weight}
head.head_series.2.inst_interact.dynamic_layer.{bias, weight}
head.head_series.2.inst_interact.norm1.{bias, weight}
head.head_series.2.inst_interact.norm2.{bias, weight}
head.head_series.2.inst_interact.norm3.{bias, weight}
head.head_series.2.inst_interact.out_layer.{bias, weight}
head.head_series.2.linear1.{bias, weight}
head.head_series.2.linear2.{bias, weight}
head.head_series.2.norm1.{bias, weight}
head.head_series.2.norm2.{bias, weight}
head.head_series.2.norm3.{bias, weight}
head.head_series.2.reg_module.0.weight
head.head_series.2.reg_module.1.{bias, weight}
head.head_series.2.reg_module.3.weight
head.head_series.2.reg_module.4.{bias, weight}
head.head_series.2.reg_module.6.weight
head.head_series.2.reg_module.7.{bias, weight}
head.head_series.2.self_attn.out_proj.{bias, weight}
head.head_series.2.self_attn.{in_proj_bias, in_proj_weight}
head.head_series.3.bboxes_delta.{bias, weight}
head.head_series.3.block_time_mlp.1.{bias, weight}
head.head_series.3.class_logits.{bias, weight}
head.head_series.3.cls_module.0.weight
head.head_series.3.cls_module.1.{bias, weight}
head.head_series.3.controller.{bias, weight}
head.head_series.3.inst_interact.dynamic_layer.{bias, weight}
head.head_series.3.inst_interact.norm1.{bias, weight}
head.head_series.3.inst_interact.norm2.{bias, weight}
head.head_series.3.inst_interact.norm3.{bias, weight}
head.head_series.3.inst_interact.out_layer.{bias, weight}
head.head_series.3.linear1.{bias, weight}
head.head_series.3.linear2.{bias, weight}
head.head_series.3.norm1.{bias, weight}
head.head_series.3.norm2.{bias, weight}
head.head_series.3.norm3.{bias, weight}
head.head_series.3.reg_module.0.weight
head.head_series.3.reg_module.1.{bias, weight}
head.head_series.3.reg_module.3.weight
head.head_series.3.reg_module.4.{bias, weight}
head.head_series.3.reg_module.6.weight
head.head_series.3.reg_module.7.{bias, weight}
head.head_series.3.self_attn.out_proj.{bias, weight}
head.head_series.3.self_attn.{in_proj_bias, in_proj_weight}
head.head_series.4.bboxes_delta.{bias, weight}
head.head_series.4.block_time_mlp.1.{bias, weight}
head.head_series.4.class_logits.{bias, weight}
head.head_series.4.cls_module.0.weight
head.head_series.4.cls_module.1.{bias, weight}
head.head_series.4.controller.{bias, weight}
head.head_series.4.inst_interact.dynamic_layer.{bias, weight}
head.head_series.4.inst_interact.norm1.{bias, weight}
head.head_series.4.inst_interact.norm2.{bias, weight}
head.head_series.4.inst_interact.norm3.{bias, weight}
head.head_series.4.inst_interact.out_layer.{bias, weight}
head.head_series.4.linear1.{bias, weight}
head.head_series.4.linear2.{bias, weight}
head.head_series.4.norm1.{bias, weight}
head.head_series.4.norm2.{bias, weight}
head.head_series.4.norm3.{bias, weight}
head.head_series.4.reg_module.0.weight
head.head_series.4.reg_module.1.{bias, weight}
head.head_series.4.reg_module.3.weight
head.head_series.4.reg_module.4.{bias, weight}
head.head_series.4.reg_module.6.weight
head.head_series.4.reg_module.7.{bias, weight}
head.head_series.4.self_attn.out_proj.{bias, weight}
head.head_series.4.self_attn.{in_proj_bias, in_proj_weight}
head.head_series.5.bboxes_delta.{bias, weight}
head.head_series.5.block_time_mlp.1.{bias, weight}
head.head_series.5.class_logits.{bias, weight}
head.head_series.5.cls_module.0.weight
head.head_series.5.cls_module.1.{bias, weight}
head.head_series.5.controller.{bias, weight}
head.head_series.5.inst_interact.dynamic_layer.{bias, weight}
head.head_series.5.inst_interact.norm1.{bias, weight}
head.head_series.5.inst_interact.norm2.{bias, weight}
head.head_series.5.inst_interact.norm3.{bias, weight}
head.head_series.5.inst_interact.out_layer.{bias, weight}
head.head_series.5.linear1.{bias, weight}
head.head_series.5.linear2.{bias, weight}
head.head_series.5.norm1.{bias, weight}
head.head_series.5.norm2.{bias, weight}
head.head_series.5.norm3.{bias, weight}
head.head_series.5.reg_module.0.weight
head.head_series.5.reg_module.1.{bias, weight}
head.head_series.5.reg_module.3.weight
head.head_series.5.reg_module.4.{bias, weight}
head.head_series.5.reg_module.6.weight
head.head_series.5.reg_module.7.{bias, weight}
head.head_series.5.self_attn.out_proj.{bias, weight}
head.head_series.5.self_attn.{in_proj_bias, in_proj_weight}
head.mask_head.0.0.weight
head.mask_head.0.1.{bias, running_mean, running_var, weight}
head.mask_head.1.0.weight
head.mask_head.1.1.{bias, running_mean, running_var, weight}
head.mask_head.2.0.weight
head.mask_head.2.1.{bias, running_mean, running_var, weight}
head.mask_head.3.0.weight
head.mask_head.3.1.{bias, running_mean, running_var, weight}
head.mask_head.4.{bias, weight}
head.mask_refine.0.0.weight
head.mask_refine.0.1.{bias, running_mean, running_var, weight}
head.mask_refine.1.0.weight
head.mask_refine.1.1.{bias, running_mean, running_var, weight}
head.mask_refine.2.0.weight
head.mask_refine.2.1.{bias, running_mean, running_var, weight}
head.time_mlp.1.{bias, weight}
head.time_mlp.3.{bias, weight}
log_one_minus_alphas_cumprod
posterior_log_variance_clipped
posterior_mean_coef1
posterior_mean_coef2
posterior_variance
sqrt_alphas_cumprod
sqrt_one_minus_alphas_cumprod
sqrt_recip_alphas_cumprod
sqrt_recipm1_alphas_cumprod
WARNING [03/31 23:17:36 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
  stem.fc.{bias, weight}
[03/31 23:17:36 d2.data.datasets.coco]: Loaded 5000 images in COCO format from datasets/coco/annotations/instances_val2017.json
[03/31 23:17:36 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[03/31 23:17:36 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[03/31 23:17:36 d2.data.common]: Serialized dataset takes 19.10 MiB
/cta/capps/detectron2/0.6/lib/python3.10/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
WARNING [03/31 23:17:37 d2.evaluation.coco_evaluation]: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
zhangxgu commented 1 year ago

Hi, if you want to evaluate diffusioninst, you should use the trained weights. See "Trained Models We now provide trained models for ResNet-50 and ResNet-101. https://pan.baidu.com/s/1KEdjNY3CSXWp0VFwkhRKYg, pwd: jhbv."

The pretrained weights you used are obtained by training classification on ImageNet, which only contains the weights of backbone just for initialization. Thus, you can not evaluate diffusioninst with only the pretrained weights.

selinkoles commented 1 year ago

Hi, I got it thank you. There are some questions that I would like to ask. 1- In the link you provided, there are ResNet50 and ResNet101, no swin-base. You don't have it, yet? 2- These weights are just for COCO right, you don't have the pretrained weights for LVIS?

zhangxgu commented 1 year ago

Hi, it's a long story. @selinkoles We have all trained weights on both COCO and LVIS with swin-b and swin-L. However, they are saved on the Ant Group working spaces. At first we only release these code with the permission of Ant Group. The weights are not included in the open-source project and are not able to be uploaded at once. The ResNet-50 and ResNet-101 are on my own PC so I can upload them on baidu. Now since many people ask us for the trained weights, we are going through the open-source project again for these weights....

selinkoles commented 1 year ago

I see. However since I am in Turkey, I cannot access baidu, I believe there are some blocks. Can you help me to access them please?

zhangxgu commented 1 year ago

@selinkoles I can send them with email to you. Can you give me your email adress?