No objects detected using pretrained weights AT_CMT

Manjuphoenix commented 1 year ago

I'm using the below code for inferencing on cityscapes training data to check the performance of the model. But the model does not detect any objects irrespective of any thresholds (0.25, 0.5, 0.75)

import detectron2 from detectron2.utils.logger import setup_logger setup_logger()

import numpy as np import os, json, cv2, random

from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer from detectron2.data import MetadataCatalog, DatasetCatalog

from adapteacher import add_ateacher_config from adapteacher.modeling.meta_arch.rcnn import DAobjTwoStagePseudoLabGeneralizedRCNN from adapteacher.modeling.meta_arch.vgg import build_vgg_backbone from adapteacher.modeling.proposal_generator.rpn import PseudoLabRPN from adapteacher.modeling.roi_heads.roi_heads import StandardROIHeadsPseudoLab

im = cv2.imread("bochum.jpg")

cfg = get_cfg() add_ateacher_config(cfg) cfg.merge_from_file(model_zoo.get_config_file("faster_rcnn_VGG_cross_city.yaml"))

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.25 # set threshold for this model cfg.MODEL.WEIGHTS = "./city_atcmt.pth" predictor = DefaultPredictor(cfg) outputs = predictor(im)

print(outputs["instances"].pred_classes) print(outputs["instances"].pred_boxes)

v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2) out = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2.imwrite("test1_op.jpg", out.get_image()[:, :, ::-1])

Shengcao-Cao commented 1 year ago

Hello Manjuphoenix,

The reason is that when you use DefaultPredictor from Detectron2, it does not correctly builds or loads the models. In AT or PT, both the teacher and student models are built. See here: https://github.com/Shengcao-Cao/CMT/blob/2965f3c977413e5b942aa4590838d781135d1ed7/CMT_AT/train_net.py#L45 However, DefaultPredictor only builds and loads one model. Therefore, if you run the code above, you may see errors like this:

Some model parameters or buffers are not found in the checkpoint:
D_img.classifier.{bias, weight}
D_img.conv1.{bias, weight}
D_img.conv2.{bias, weight}
D_img.conv3.{bias, weight}
...
The checkpoint state_dict contains keys that are not used by the model:
  modelTeacher.backbone.vgg0.0.{bias, weight}
  modelTeacher.backbone.vgg0.1.{bias, num_batches_tracked, running_mean, running_var, weight}
  modelTeacher.backbone.vgg0.3.{bias, weight}
  modelTeacher.backbone.vgg0.4.{bias, num_batches_tracked, running_mean, running_var, weight}

To fix this, you need convert the checkpoint keys and reload the weights after building the predictor:

# Load teacher model weights from the checkpoint
import torch
ckpt = torch.load(cfg.MODEL.WEIGHTS)['model']
state_dict = {}
for key, value in ckpt.items():
    if key.startswith('modelTeacher.'):
        key = key.replace('modelTeacher.', '')
        state_dict[key] = value
predictor.model.load_state_dict(state_dict)

Manjuphoenix commented 1 year ago

I was training the same model for a different dataset, the training was happening normally till pseudo label loss was calculated after which the loss which was previously in range (0.2) got increased to (1.8-2) do you know why this is happening?

And for the same above custom dataset model, inference gave the following error while implementing the above corrections

File "inference.py", line 50, in predictor.model.load_state_dict(state_dict) File "/root/anaconda3/envs/fbadapt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for DAobjTwoStagePseudoLabGeneralizedRCNN: size mismatch for roi_heads.box_predictor.cls_score.weight: copying a param with shape torch.Size([7, 1024]) from checkpoint, the shape in current model is torch.Size([9, 1024]). size mismatch for roi_heads.box_predictor.cls_score.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([9]). size mismatch for roi_heads.box_predictor.bbox_pred.weight: copying a param with shape torch.Size([24, 1024]) from checkpoint, the shape in current model is torch.Size([32, 1024]). size mismatch for roi_heads.box_predictor.bbox_pred.bias: copying a param with shape torch.Size([24]) from checkpoint, the shape in current model is torch.Size([32]).

Shengcao-Cao commented 1 year ago

Hello, I think unsupervised domain adaptation can be unstable in general. You may try to change the hyper-parameters a bit (e.g., scale down your learning rate). For the error you showed, the reason should be incompatible number of classes in the dataset. Please double check the number of classes in your customized dataset and the number of classes in the model configuration.

Manjuphoenix commented 1 year ago

Hi @Shengcao-Cao as you rightly said it was the number of classes that was causing the error Thank you for the response

Shengcao-Cao / CMT

No objects detected using pretrained weights AT_CMT #2