flyinglynx / Bilinear-Matching-Network

Official implementation for CVPR 2022 paper "Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting".
MIT License
69 stars 9 forks source link

CARPK Dataset Setting #10

Closed Brian96086 closed 2 years ago

Brian96086 commented 2 years ago

Hi, I'm a summer research intern in Academia Sinica from the Institute of Information Science, and my group has been trying to replicate the BMNet performance on CARPK dataset. We would like to ask the specific settings and data preprocessing of CARPK dataset that yields the performance listed on the paper.

In particular, How do you set the kernel size of every car in CARPK dataset and how do you deal with scale embedding? We ask this question because we process every picture in CARPK of (width, height) = (720, 1280) to (736, 1280) with zero padding in order to match the preprocessing settings of FSC147 (zero-pad image such that the width are multiples of 32). The scale embedding formula is kept the same. Yet, using the BMNet model(without plus version), we got a validation performance of MAE = 46.48, MSE = 61.75, which contains a large disparity compared to the result on paper(MAE 14.61, MSE 24.60).

Is there any guidelines or code on how you process the dataset? Thanks for your help !

flyinglynx commented 2 years ago

Hi, thanks for your interest on our work! You can e-mail me (min_shi@hust.edu.cn), so that I can provide you with the exemplars and dataloader code we use for CARPK dataset in the reply.

For kernel size setting, do you mean the pooling size for exemplars or the gaussian kernel sizes for model finetuning? For pooling size, we retain the pooling operation in BMNet, i.e., for each exemplar, we resize it to 128 * 128 to extract feature map. Then, the feature map is pooled into a 1D vector. For finetuning, the variance (sigma) for each car is set to 24. As exemplars are not cropped from images, we exclude the scale embedding vector in cross-dataset experiments. Specifically, we exclude the embedding vector setting and the car category in FSC147 dataset to train the models, and test these models on CARPK dataset.

For data-processing, the same resizing strategy are used, i.e., constraining the short side to be longer than 384 and the long side to be shorter than 2048. I think that's very close to your data pre-processing.

flyinglynx commented 2 years ago

Is there any guidelines or code on how you process the dataset? Thanks for your help !

code and exemplars.zip Your insituition e-mail server always rejects my email. So I upload the dataloader code and exemplars here.

Future-Outlier commented 2 years ago

I am Brian's co-worker, thank you very much man !

donggoing commented 11 months ago

@flyinglynx Thanks for your great work, but i would like to ask how could i reproduce the reported result on CARPK with the dataloader code and exemplars? I got a validation performance of MAE 80.06, MSE 86.97 with the followed config and the pretrained params(repo).

@Brian96086 @Future-Outlier Can you reproduce the validation result?

DIR: 
  dataset: "/xxx/CARPK/CARPK_devkit/data/"
  exp: "bmnet+compare"
  snapshot: "exps"

DATASET:
  name: "CARPK_fixep"
  list_train: "/xxx/train.json"
  list_val: "/xxx/test.json"
  exemplar_number: 3
  downsampling_rate: 1

MODEL:
  backbone: "resnet50"
  epf_extractor: "direct_pooling"
  fix_bn: True
  ep_scale_embedding: True
  ep_scale_number: 20
  use_bias: True
  refiner: "self_similarity_module"
  matcher: "dynamic_similarity_matcher"
  counter: "density_x16"
  backbone_layer: "layer3"
  hidden_dim: 256
  refiner_layers: 1
  matcher_layers: 1
  refiner_proj_dim: 32
  matcher_proj_dim: 256
  dynamic_proj_dim: 128
  counter_dim: 257
  repeat_times: 1
  pretrain: True

TRAIN:
  resume: "model_ckpt.pth"
  counting_loss: "l2loss"
  contrast_loss: "info_nce"
  contrast_weight: 5e-6
  optimizer: "AdamW"
  device: "cuda:0"
  batch_size: 8
  epochs: 300
  lr_backbone: 1e-5
  lr: 1e-5
  lr_drop: 300 # We do not modify learning rate.
  momentum: 0.95
  weight_decay: 5e-4
  clip_max_norm: 0.1
  num_workers: 1
  seed: 430
  shots: 12

VAL:
  resume: "./model_best.pth"
  evaluate_only: True
  use_split: "val"
  visualization: False