Unable to reproduce results on Cityscapes

MICV-yonsei / EAGLE

[CVPR 2024 Highlight✨] Official Pytorch Code for EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation

MIT License

66 stars 6 forks source link

Unable to reproduce results on Cityscapes #7

Open benearnthof opened 1 week ago

benearnthof commented 1 week ago

After running the preprocessing and training scripts as they are provided in this repository I was unable to replicate the results of EAGLE on the cityscapes dataset. I trained with the configs presented here and adjusted the training hyperparameters to those available through downloading the cityscapes checkpoint the authors provided through google drive. Even after 25000 training steps on cityscapes cluster test accuracy only reaches 67%. I have attached an output plot to this issue. Could you provide insight into how to replicate the results? I've noticed that there are additional clustering parameters present in the SOTA checkpoint that are not used in the train config files here. Do you perform an addtional post processing step?

This is what my results look like after 25k training steps on cityscapes.

kochanha commented 1 week ago

Hi, thanks for your interest in our work. For the Cityscapes dataset, we trained on a single GPU and used a weight near 2.9K steps as our final weight. Also, the picture you have attached is the result before applying the CRF. You can have the results after CRF processing via eval_segmentation.py and note that there is a significant performance difference between what you see in wandb (before applying CRF) and after applying CRF. Here are my wandb results for reference.

benearnthof commented 1 week ago

That must be what I have been missing, but I had assumed CRF would not make that much of a difference. I'll report back after I run the eval script. Thank you very much for the swift reply!

benearnthof commented 1 week ago

After running the evaluation on my own cityscapes checkpoint I obtain the following metrics:

{
'final/linear/mIoU': 32.06715285778046, 
'final/linear/Accuracy': 90.96953272819519, 
'assignments': [9, 8, 4, 6, 14, 5, 7, 11, 3, 18, 16, 26, 20, 12, 22, 23, 0, 1, 24, 10, 19, 25, 13, 15, 21, 2, 17], 
'final/cluster/mIoU': 15.385963022708893,
'final/cluster/Accuracy': 73.27690720558167
}

Seems like the CRF postprocessing does a lot of heavy lifting I'll rerun training with 3000 steps like you recommended and report back. Thanks a lot for the help!

benearnthof commented 1 week ago

I've done evaluation on 5 other checkpoints that each were trained for 5000 steps where I picked the highest performing checkpoint for each run and ran them through the evaluation script. (Each training run saved checkpoints every 10 steps as was suggested to me in another issue.) The mean cluster Accuracy for Cityscapes is 70.4, the maximum Accuracy I obtained after CRF evaluation was 74.1. A lot better than my previous results but still quite far from the performance reported in the paper. Did you do any additional hyperparameter tuning?