can not reproduce the PQ on ade20k dataset by training from scratch？

wjgaas commented 1 year ago

can not get PQ 48 on ade20k dataset with swim-L backbone，i only get PQ 46，how to get the result you provide in the paper?

praeclarumjj3 commented 1 year ago

Hi @wjgaas, thanks for your interest in our work. How many times did you train? You may need to retrain to accommodate the variance in performance. We trained all our models thrice and reported the best results.

wjgaas commented 1 year ago

Hi @wjgaas, thanks for your interest in our work. How many times did you train? You may need to retrain to accommodate the variance in performance. We trained all our models thrice and reported the best results.

thansk for you reply, i trained thrice, the results are 46.31, 46.27, 46.52 PQ on ade20k dataset with swim-L backbone, with the config you provided in the readme (https://github.com/SHI-Labs/OneFormer/blob/main/configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml).

praeclarumjj3 commented 1 year ago

Hi @wjgaas, could you also share the scores for AP and mIoU metrics? Also, to be sure, you are using the same environment as suggested in the installation instructions, right?

praeclarumjj3 commented 1 year ago

Hi @wjgaas, were you able to resolve this issue? If not, could you share your training logs? Also, did you try training with loading the https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window12_384_22kto1k.pth weights?

praeclarumjj3 commented 1 year ago

Closing this due to inactivity. Feel free to re-open if you still face issues.

achen46 commented 1 year ago

Hi @wjgaas, thanks for your interest in our work. How many times did you train? You may need to retrain to accommodate the variance in performance. We trained all our models thrice and reported the best results.

thansk for you reply, i trained thrice, the results are 46.31, 46.27, 46.52 PQ on ade20k dataset with swim-L backbone, with the config you provided in the readme (https://github.com/SHI-Labs/OneFormer/blob/main/configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml).

I also experience the same issue.

achen46 commented 1 year ago

In general, results should not vary that much within the same runs (if you run it 10 times, the variance needs to be reasonable) or even with a slightly different dependencies.

I see your paper is accepted to CVPR and congrats on that but this is a very serious issue and I hope authors address it.

A good first step is to publish all the logs.

praeclarumjj3 commented 1 year ago

Hi @achen46, thank you for your interest in our work.

Please share your logs and exact details on your environment (GPU architecture and model, CUDA toolkit version, PyTorch, Torchvision, Detectron, and NATTEN versions + their compiled CUDA versions), so we can help you. That is the first piece of information any issue on an open-source repository requires. Simply stating that "it does not work with exactly following instructions" does not help.

We ran an experiment with a fresh clone of the same code (this GitHub repo) that you are having issues with, and we got the following numbers: PQ: 50.5, AP: 36.2, mIoU (s.s./m.s.): 56.6/57.6 (trained yesterday on 03/05/2023). The results are better than our reported numbers in our CVPR paper with PQ: 49.8, AP: 35.9, mIoU (s.s./m.s.): 57.0/57.7 (trained 7 months ago on 08/14/2022), where we only ran three times and reported the best number.

You can find the WandB logs for the original and reproduced runs here: WandB logs. We also share the training log with step-wise loss values for your reference and environment setup details to help your experiments.

SHI-Labs / OneFormer

can not reproduce the PQ on ade20k dataset by training from scratch？ #14