Issue with Evaluation Metrics

I'm having trouble evaluating my model using the provided checkpoint, and I noticed that the evaluation metrics are different from those reported in the original paper. Is there anything wrong with my setup or is this an inconsistency between the implementation and the paper?

Metric	1	2	3	4	5
our metric	0.9235	0.73	0.57425	0.40825	0.29225
original metric in paper (100% data)	0.968	0.893	0.815	0.727	0.644

bytedance / GR-MG

Issue with Evaluation Metrics #4