The performance gap between pretrained models and paper

gujiaqivadin commented 3 months ago

Hello, @ch3cook-fdu!

Thanks for sharing your work about indoor 3d dense captioning. Recently I have tried to train the Vote2Cap-DETR(++) with different configs. I noticed that there is a slightly performance gap between metrics of (mine model)/(pretrained model of this repo) and (Table results in the paper).

Take scst_Vote2Cap_DETRv2_RGB_NORMAL with SCST settings for example:

My Results: ----------------------Evaluation----------------------- INFO: iou@0.5 matched proposals: [1543 / 2068], [BLEU-1] Mean: 0.6721, Max: 1.0000, Min: 0.0000 [BLEU-2] Mean: 0.5761, Max: 1.0000, Min: 0.0000 [BLEU-3] Mean: 0.4759, Max: 1.0000, Min: 0.0000 [BLEU-4] Mean: 0.3892, Max: 1.0000, Min: 0.0000 [CIDEr] Mean: 0.7539, Max: 6.2306, Min: 0.0000 [ROUGE-L] Mean: 0.5473, Max: 0.9474, Min: 0.1015 [METEOR] Mean: 0.2638, Max: 0.5982, Min: 0.0448

Pretrained Model Results ----------------------Evaluation----------------------- INFO: iou@0.5 matched proposals: [1548 / 2068], [BLEU-1] Mean: 0.6729, Max: 1.0000, Min: 0.0000 [BLEU-2] Mean: 0.5787, Max: 1.0000, Min: 0.0000 [BLEU-3] Mean: 0.4783, Max: 1.0000, Min: 0.0000 [BLEU-4] Mean: 0.3916, Max: 1.0000, Min: 0.0000 [CIDEr] Mean: 0.7636, Max: 6.3784, Min: 0.0000 [ROUGE-L] Mean: 0.5496, Max: 1.0000, Min: 0.1015 [METEOR] Mean: 0.2641, Max: 1.0000, Min: 0.0448

and Paper Results 11AE0D09-CEAA-45AD-BF94-6D4EE0E0FDB8

About 1% ~2.5% performance gap exists in every different configs and settings, I am wondering how to figure it out.

Thanks, Jiaqi

ch3cook-fdu commented 3 months ago

Due to the randomness in data pre-processing (point down-sampling), the performance on your local machine might be slightly different from the metrics we achieved.

We encourage you to train the whole model from the very begining (i.e. pre-train on detection -> then dense captioning) and see whether the results align.

For more details on the randomness analysis, please refer to https://github.com/ch3cook-fdu/Vote2Cap-DETR/issues/12.

jkstyle2 commented 3 months ago

@ch3cook-fdu hello, I'm also wondering why the result shown in paper is different from the officially reported benchmark on Scan2Cap Benchmark as below. Would you tell me what's the difference between the paper's and the benchmark evaluation?

ch3cook-fdu commented 3 months ago

The reported results in the paper are m@kIoU evaluated on the ScanRefer validation set, while the official benchmark shows results of the test set. The metrics are also different.

Please refer to https://kaldir.vc.in.tum.de/scanrefer_benchmark/documentation and UniT3D paper equation 1 for more details.

1301358882 commented 1 month ago

Hello, do I need to find the best result in the log file by myself? Not the last evaluation displayed at the end of the run is the best result. Thank you!

ch3cook-fdu / Vote2Cap-DETR

The performance gap between pretrained models and paper #14