dvlab-research / PFENet

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).
322 stars 53 forks source link

Performance inconsistency between paper and reproduce. #43

Closed zhijiew closed 3 years ago

zhijiew commented 3 years ago

Thank you for your great work. I learned a lot from your paper.

I tested the pre-trained models you provided (ResNet50-based for Pascal VOC, 1 shot), but I can get better performance than you reported in your paper.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Method | Split0 | Split1 | Split2 | Split3 | Mean -- | -- | -- | -- | -- | -- Reproduce | 61.8 | 69.9 | 56.3 | 56.6 | 61.2 Paper | 61.7 | 69.5 | 55.4 | 56.3 | 60.8

Is this performance fluctuation within the normal range? I used the same codes and settings in you github repo.

I also tried to train another baseline experiment (ResNet50-based for Pascal VOC, 5 shots) by myself using your configs. <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Method | Split0 | Split1 | Split2 | Split3 | Mean -- | -- | -- | -- | -- | -- Reproduce | 64.7 | 71.5 | 55.5 | 60.6 | 63.1 Paper | 63.1 | 70.7 | 55.8 | 57.9 | 61.9

There seems to be a greater fluctuation.

tianzhuotao commented 3 years ago

@zhijiew Hi, thanks for your attention.

This repo is a reproduced one that removes some redundant parameter definitions/functions compared to the original one we used to get the results in the paper. I have no idea why this repo can sometimes achieve better performance than the reported one, and the different running environments might cause the performance fluctuation.

You can find more details in this issue: https://github.com/dvlab-research/PFENet/issues/6.

The 1-shot performance variance of your reproduction is acceptable, according to the issue above.

As for the 5-shot results, in our paper, we directly tested the model trained with 1-shot with 5 support samples, but this might not be the optimal situation, which could explain why you can get a much better 5-shot result than our reported one.

Thank you.

zhijiew commented 3 years ago

Got it! Thank you very much for your reply!