The test result from uploaded FSC147.pth does not match the result of the paper

Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting

https://verg-avesta.github.io/CounTR_Webpage/

MIT License

92 stars 9 forks source link

The test result from uploaded FSC147.pth does not match the result of the paper #23

Closed Jyerim closed 1 year ago

Jyerim commented 1 year ago

The test result from uploaded FSC147.pth does not match the result of the paper.

The inference result for FSC147 test data is MAE 15.71, RMSE 104.99. I use the FSC147 fine-tuned weights which you upload on the document.

However, the zero shot result is similar to the result of the paper.

Few shot : 15.71 / 104.99 Zero shot: 14.70 / 106.87

I ran the evaluation code with torch 1.10, timm (0.3.2, 0.4.5). I tried 0.3.2 and 0.4.5 version of timm.

Is there any way to fix this issue?

Jyerim commented 1 year ago

I found the reason. There was a minor code mistake.

FSC_test_cross(few_shot).py rects = list() for bbox in bboxes: x1 = int(bbox[0][0] scale_factor_W) y1 = int(bbox[0][1] scale_factor_H) x2 = int(bbox[1][0] scale_factor_W) ### change to bbox[2][0] y2 = int(bbox[1][1] scale_factor_H) ### change to bbox[2][1] rects.append([y1, x1, y2, x2])

GioFic95 commented 1 year ago

Hi @Jyerim, sorry for introducing this mistake.

Can you share your new results? My experiments with the original test code, without this error, are still quite different with respect to the results reported in the paper, and I'd like to know if it is due to the different PyTorch/timm version.

I tested the few shot counting task with the uploaded weights FSC147.pth:

With the test script published by @Verg-Avesta I obtain MAE: 14.97 and RMSE: 106.50;
With my script with your fix I get MAE: 13.00, RMSE: 105.75;
The results in the paper are MAE: 11.95, RMSE: 91.23.

You can check the difference between the two scripts here.

Verg-Avesta commented 1 year ago

I don't think the difference of PyTorch/timm version will make the result vary so much. Maybe there is something wrong with the 0-shot and 3-shot setting? Or maybe you din't use test-time normalisation? In Issue #7, he used the checkpoint I uploaded and got the result of MAE: 12.44, RMSE: 89.86, so there might be something wrong with the test script you used.

GioFic95 commented 1 year ago

Thank you @Verg-Avesta, I managed to obtain the results you mentioned. I'll open a new pull request with the fix in a few minutes.