The provided models generate lower scores than the paper reported

medical-girl commented 1 year ago

Thanks for you nice work, meanwhile, may I confirm one thing? By using your features and pre-trained models (epoch=120), the obtained scores are lower than your BMVC paper for three datasets. For instance, the edit and F1@10 of gtea can only reach 84.0 and 88.9, which are lower than 84.6 and 90.1 in your paper. Same for another two datasets. 50salads edit=75.7, F1@10=83.4.

pangzhan27 commented 1 year ago

Hi, same issue here. Using your uploaded model for evaluation, I get much inferior performance. Here are the results I get when evaluating on different dataset.

[50salads ] --- edit: 75.73, f1_10: 83.43, f1_25: 80.73, f1_50: 74.57, f_acc: 85.03 [gtea] --- edit: 84.04, f1_10: 88.69, f1_25: 87.76, f1_50: 79.02, f_acc: 79.98 [breakfast] --- edit: 73.5, f1_10: 74.04, f1_25: 68.69, f1_50: 55.02, f_acc: 72.44

Below are what have been reported in your paper.

[50salads ] --- edit: 79.6, f1_10: 85.1, f1_25: 83.4, f1_50: 76.0, f_acc: 85.6 [gtea] --- edit: 84.6, f1_10: 90.1, f1_25: 88.8, f1_50: 79.2, f_acc: 79.7 [breakfast] --- edit: 75.0, f1_10: 76.0, f1_25: 70.6, f1_50: 57.4, f_acc: 73.5

Any ideas why this happens?

Many thanks!

ChinaYi commented 1 year ago

The result we reported in the paper is the average value of all splits.

pangzhan27 commented 1 year ago

Thank you for your reply. But, the results I showed are also the average of all splits.

I am not sure if the pytorch version is the cause. I get with the model size with 1134476 parameters.
If i remember correctly, this number is inconsistent with others' report(like those reported in the closed issues).

ChinaYi / ASFormer

The provided models generate lower scores than the paper reported #14