Performance evaluation of DAVIS

lingorX commented 3 years ago

Hi, first thank you for making this work public.

When running the test script of DAVIS, as you stated,

The performance on DAVIS can be directly obtained from the outputs

I got the mean J&F=70.4 exactly as described in paper.

But I found that you code is based on MAST, and the output of testing script should be a rough video-wise estimating of performance, not the real one across objects. After evaluated the generated masks on the official repo, I only get mean J&F=69.0.

So I think there may be some mistakes in the evaluating step.

bo-miao commented 3 years ago

Hi,

The framework of MAMP is inspired by MAST that using the generative task to train the model, but our method is different from MAST in the mask propagation module and size-aware image feature alignment method. Moreover, MAST cannot converge well under PyTorch 3.9 and we fix this problem.

For a fair comparison, we use the same evaluation method and code as MAST (https://github.com/zlai0/MAST) on DAVIS-17. So I think maybe you should raise the issue on that repo since MAST is the current benchmark to compare. We will re-evaluate our method if they adopt your question.

Thank you.

lingorX commented 3 years ago

Hi, In the readme of MAST, they specify to evaluate the performance with official channel. So there is no need to open an issue. Also, I'm curious which part makes MAST not converge on RTX 3090 which needs pytorch 1.8+.

bo-miao commented 3 years ago

Hi,

I am in the hospital for a few days and it’s not convenient to have access to my server. In that case, could you verify the results of MAST and post the official results that you implemented on DAVIS at here? Thanks.

When I verify MAST, I get the same results as in the paper on DAVIS when I using their own inference code. So I reuse that code to evaluate our method. If MAST get 65.5 on both their own inference code and official code that you implemented but MAMP does not, we will re-check the inference code when I can use the server. We will make the comparison as fair as possible.

lingorX commented 3 years ago

Hi, I am sorry to bother you when you are sick ...

I have re-evaluated the checkpoint provided by MAST's repo.

Inference code of MAST:

[2021-08-09 21:09:14 benchmark.py:28] INFO datapath: DAVIS 
[2021-08-09 21:09:14 benchmark.py:28] INFO ref: 0 
[2021-08-09 21:09:14 benchmark.py:28] INFO resume: checkpoint.pt 
[2021-08-09 21:09:14 benchmark.py:28] INFO savepath: momend 
[2021-08-09 21:09:14 benchmark.py:28] INFO training: False 
[2021-08-09 21:09:14 benchmark.py:38] INFO Number of model parameters: 5291648 
[2021-08-09 21:09:14 benchmark.py:42] INFO => loading checkpoint 'checkpoint.pt' 
[2021-08-09 21:09:17 benchmark.py:48] INFO => loaded checkpoint 'checkpoint.pt' 
[2021-08-09 21:09:17 benchmark.py:72] INFO Start testing. 
[2021-08-09 21:10:22 benchmark.py:165] INFO [0/30] Js: (0.620). Fs: (0.698). 
[2021-08-09 21:10:54 benchmark.py:165] INFO [1/30] Js: (0.700). Fs: (0.768). 
[2021-08-09 21:12:03 benchmark.py:165] INFO [2/30] Js: (0.557). Fs: (0.706). 
[2021-08-09 21:12:59 benchmark.py:165] INFO [3/30] Js: (0.578). Fs: (0.694). 
[2021-08-09 21:14:01 benchmark.py:165] INFO [4/30] Js: (0.591). Fs: (0.713). 
[2021-08-09 21:14:52 benchmark.py:165] INFO [5/30] Js: (0.614). Fs: (0.711). 
[2021-08-09 21:15:14 benchmark.py:165] INFO [6/30] Js: (0.627). Fs: (0.716). 
[2021-08-09 21:16:28 benchmark.py:165] INFO [7/30] Js: (0.664). Fs: (0.739). 
[2021-08-09 21:17:29 benchmark.py:165] INFO [8/30] Js: (0.647). Fs: (0.720). 
[2021-08-09 21:18:08 benchmark.py:165] INFO [9/30] Js: (0.656). Fs: (0.726). 
[2021-08-09 21:19:13 benchmark.py:165] INFO [10/30] Js: (0.655). Fs: (0.722). 
[2021-08-09 21:19:45 benchmark.py:165] INFO [11/30] Js: (0.656). Fs: (0.721). 
[2021-08-09 21:20:15 benchmark.py:165] INFO [12/30] Js: (0.647). Fs: (0.708). 
[2021-08-09 21:21:18 benchmark.py:165] INFO [13/30] Js: (0.658). Fs: (0.713). 
[2021-08-09 21:23:03 benchmark.py:165] INFO [14/30] Js: (0.685). Fs: (0.730). 
[2021-08-09 21:23:42 benchmark.py:165] INFO [15/30] Js: (0.689). Fs: (0.740). 
[2021-08-09 21:25:04 benchmark.py:165] INFO [16/30] Js: (0.679). Fs: (0.721). 
[2021-08-09 21:25:27 benchmark.py:165] INFO [17/30] Js: (0.682). Fs: (0.724). 
[2021-08-09 21:26:14 benchmark.py:165] INFO [18/30] Js: (0.665). Fs: (0.711). 
[2021-08-09 21:27:14 benchmark.py:165] INFO [19/30] Js: (0.653). Fs: (0.691). 
[2021-08-09 21:27:44 benchmark.py:165] INFO [20/30] Js: (0.656). Fs: (0.696). 
[2021-08-09 21:28:31 benchmark.py:165] INFO [21/30] Js: (0.656). Fs: (0.695). 
[2021-08-09 21:29:37 benchmark.py:165] INFO [22/30] Js: (0.648). Fs: (0.693). 
[2021-08-09 21:30:06 benchmark.py:165] INFO [23/30] Js: (0.642). Fs: (0.688). 
[2021-08-09 21:31:27 benchmark.py:165] INFO [24/30] Js: (0.630). Fs: (0.681). 
[2021-08-09 21:32:38 benchmark.py:165] INFO [25/30] Js: (0.630). Fs: (0.680). 
[2021-08-09 21:33:58 benchmark.py:165] INFO [26/30] Js: (0.641). Fs: (0.689). 
[2021-08-09 21:34:30 benchmark.py:165] INFO [27/30] Js: (0.642). Fs: (0.688). 
[2021-08-09 21:35:29 benchmark.py:165] INFO [28/30] Js: (0.639). Fs: (0.685). 
[2021-08-09 21:37:13 benchmark.py:165] INFO [29/30] Js: (0.640). Fs: (0.687). 
[2021-08-09 21:37:13 benchmark.py:60] INFO full testing time = 0.47 Hours

Official code:

Evaluating sequences for the semi-supervised task...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:35<00:00,  3.17s/it]
Global results saved in ../MAST/momend/benchmark/global_results-val.csv
Per-sequence results saved in ../MAST/momend/benchmark/per-sequence_results-val.csv
--------------------------- Global results for val ---------------------------
 J&F-Mean   J-Mean  J-Recall  J-Decay    F-Mean  F-Recall  F-Decay
 0.657526  0.63622  0.745465  0.25774  0.678831  0.785443  0.29228

---------- Per sequence results for val ----------
             Sequence    J-Mean    F-Mean
       bike-packing_1  0.480788  0.628748
       bike-packing_2  0.761695  0.770537
          blackswan_1  0.921722  0.963251
          bmx-trees_1  0.190787  0.465673
          bmx-trees_2  0.595965  0.810131
         breakdance_1  0.664590  0.645952
              camel_1  0.655145  0.805669
     car-roundabout_1  0.775848  0.702136
         car-shadow_1  0.831097  0.798872
               cows_1  0.893553  0.876756
        dance-twirl_1  0.509063  0.568632
                dog_1  0.779912  0.807127
          dogs-jump_1  0.578673  0.669788
          dogs-jump_2  0.584970  0.585592
          dogs-jump_3  0.789689  0.858745
      drift-chicane_1  0.675905  0.692483
     drift-straight_1  0.456805  0.416689
               goat_1  0.796213  0.773255
          gold-fish_1  0.702340  0.691760
          gold-fish_2  0.683785  0.733790
          gold-fish_3  0.802630  0.835651
          gold-fish_4  0.829149  0.866384
          gold-fish_5  0.865846  0.825944
     horsejump-high_1  0.767299  0.855796
     horsejump-high_2  0.755430  0.947680
              india_1  0.621502  0.576030
              india_2  0.586131  0.568298
              india_3  0.509617  0.497995
               judo_1  0.818597  0.841820
               judo_2  0.760728  0.777259
          kite-surf_1  0.307821  0.298465
          kite-surf_2  0.339554  0.494010
          kite-surf_3  0.638690  0.842919
           lab-coat_1  0.000000  0.000000
           lab-coat_2  0.000000  0.000000
           lab-coat_3  0.876274  0.760130
           lab-coat_4  0.916133  0.878763
           lab-coat_5  0.890503  0.862876
              libby_1  0.815322  0.946394
            loading_1  0.894641  0.813107
            loading_2  0.482934  0.581714
            loading_3  0.626556  0.637130
        mbike-trick_1  0.502991  0.749047
        mbike-trick_2  0.516736  0.601366
     motocross-jump_1  0.392565  0.479469
     motocross-jump_2  0.493480  0.521268
 paragliding-launch_1  0.770134  0.823473
 paragliding-launch_2  0.659266  0.883304
 paragliding-launch_3  0.018357  0.069409
            parkour_1  0.652936  0.668643
               pigs_1  0.794340  0.771685
               pigs_2  0.707950  0.847957
               pigs_3  0.879149  0.828674
      scooter-black_1  0.658347  0.716700
      scooter-black_2  0.646861  0.601611
           shooting_1  0.260947  0.258821
           shooting_2  0.640829  0.533962
           shooting_3  0.812100  0.917409
            soapbox_1  0.741423  0.767111
            soapbox_2  0.692901  0.763180
            soapbox_3  0.534220  0.631662

Total time:95.29174566268921

The results are slightly higher than they reported.

bo-miao commented 3 years ago

Thank you for your mention. I am not sure if it was caused by different environments but I will handle the problem later.

bo-miao commented 3 years ago

Hi,

I have evaluated MAMP with the official code and obtained 69.7477 J&F. I will revise the results in the paper after I could return home.

Evaluating sequences for the semi-supervised task...
100%|███████████████████████████████████████████| 30/30 [01:04<00:00,  2.15s/it]
Global results saved in /home/bomiao/Documents/paper_code/MAMP/ckpt/Test_MAMP_mamp704/DAVIS/global_results-val.csv
Per-sequence results saved in /home/bomiao/Documents/paper_code/MAMP/ckpt/Test_MAMP_mamp704/DAVIS/per-sequence_results-val.csv
--------------------------- Global results for val ---------------------------
 J&F-Mean   J-Mean  J-Recall  J-Decay   F-Mean  F-Recall  F-Decay
 0.697477 0.682889   0.81606  0.19502 0.712064   0.83835 0.221988

---------- Per sequence results for val ----------
            Sequence   J-Mean   F-Mean
      bike-packing_1 0.575248 0.730252
      bike-packing_2 0.793335 0.812726
         blackswan_1 0.951745 0.985871
         bmx-trees_1 0.157140 0.491566
         bmx-trees_2 0.599169 0.789950
        breakdance_1 0.709860 0.685751
             camel_1 0.709066 0.803754
    car-roundabout_1 0.745022 0.641493
        car-shadow_1 0.729525 0.661500
              cows_1 0.828064 0.842474
       dance-twirl_1 0.642147 0.644053
               dog_1 0.850639 0.852469
         dogs-jump_1 0.742803 0.772127
         dogs-jump_2 0.854538 0.878550
         dogs-jump_3 0.902825 0.947254
     drift-chicane_1 0.842132 0.911296
    drift-straight_1 0.679926 0.476871
              goat_1 0.714302 0.628545
         gold-fish_1 0.732893 0.689928
         gold-fish_2 0.716694 0.764010
         gold-fish_3 0.779482 0.783914
         gold-fish_4 0.824331 0.859665
         gold-fish_5 0.872862 0.844701
    horsejump-high_1 0.772836 0.866133
    horsejump-high_2 0.728486 0.915676
             india_1 0.754207 0.710280
             india_2 0.708244 0.704677
             india_3 0.681828 0.687948
              judo_1 0.861113 0.897679
              judo_2 0.811408 0.830144
         kite-surf_1 0.336730 0.327505
         kite-surf_2 0.270528 0.291336
         kite-surf_3 0.754776 0.914193
          lab-coat_1 0.007778 0.020690
          lab-coat_2 0.000000 0.000000
          lab-coat_3 0.948303 0.902281
          lab-coat_4 0.922776 0.821939
          lab-coat_5 0.909824 0.887201
             libby_1 0.758741 0.892876
           loading_1 0.882622 0.817000
           loading_2 0.492643 0.604770
           loading_3 0.650231 0.705151
       mbike-trick_1 0.679923 0.766727
       mbike-trick_2 0.753820 0.813562
    motocross-jump_1 0.464892 0.533696
    motocross-jump_2 0.640413 0.649051
paragliding-launch_1 0.833832 0.878521
paragliding-launch_2 0.738824 0.918454
paragliding-launch_3 0.008771 0.033827
           parkour_1 0.763694 0.771678
              pigs_1 0.824465 0.789646
              pigs_2 0.655776 0.770395
              pigs_3 0.940607 0.904794
     scooter-black_1 0.668694 0.680960
     scooter-black_2 0.756713 0.686259
          shooting_1 0.245600 0.304995
          shooting_2 0.638795 0.514793
          shooting_3 0.764462 0.841682
           soapbox_1 0.774250 0.792905
           soapbox_2 0.742217 0.836193
           soapbox_3 0.553686 0.651563

Total time:64.5169906616211

bo-miao / MAMP

Performance evaluation of DAVIS #1