Closed lingorX closed 3 years ago
Hi,
The framework of MAMP is inspired by MAST that using the generative task to train the model, but our method is different from MAST in the mask propagation module and size-aware image feature alignment method. Moreover, MAST cannot converge well under PyTorch 3.9 and we fix this problem.
For a fair comparison, we use the same evaluation method and code as MAST (https://github.com/zlai0/MAST) on DAVIS-17. So I think maybe you should raise the issue on that repo since MAST is the current benchmark to compare. We will re-evaluate our method if they adopt your question.
Thank you.
Hi, In the readme of MAST, they specify to evaluate the performance with official channel. So there is no need to open an issue. Also, I'm curious which part makes MAST not converge on RTX 3090 which needs pytorch 1.8+.
Hi,
I am in the hospital for a few days and it’s not convenient to have access to my server. In that case, could you verify the results of MAST and post the official results that you implemented on DAVIS at here? Thanks.
When I verify MAST, I get the same results as in the paper on DAVIS when I using their own inference code. So I reuse that code to evaluate our method. If MAST get 65.5 on both their own inference code and official code that you implemented but MAMP does not, we will re-check the inference code when I can use the server. We will make the comparison as fair as possible.
Hi, I am sorry to bother you when you are sick ...
I have re-evaluated the checkpoint provided by MAST's repo.
Inference code of MAST:
[2021-08-09 21:09:14 benchmark.py:28] INFO datapath: DAVIS
[2021-08-09 21:09:14 benchmark.py:28] INFO ref: 0
[2021-08-09 21:09:14 benchmark.py:28] INFO resume: checkpoint.pt
[2021-08-09 21:09:14 benchmark.py:28] INFO savepath: momend
[2021-08-09 21:09:14 benchmark.py:28] INFO training: False
[2021-08-09 21:09:14 benchmark.py:38] INFO Number of model parameters: 5291648
[2021-08-09 21:09:14 benchmark.py:42] INFO => loading checkpoint 'checkpoint.pt'
[2021-08-09 21:09:17 benchmark.py:48] INFO => loaded checkpoint 'checkpoint.pt'
[2021-08-09 21:09:17 benchmark.py:72] INFO Start testing.
[2021-08-09 21:10:22 benchmark.py:165] INFO [0/30] Js: (0.620). Fs: (0.698).
[2021-08-09 21:10:54 benchmark.py:165] INFO [1/30] Js: (0.700). Fs: (0.768).
[2021-08-09 21:12:03 benchmark.py:165] INFO [2/30] Js: (0.557). Fs: (0.706).
[2021-08-09 21:12:59 benchmark.py:165] INFO [3/30] Js: (0.578). Fs: (0.694).
[2021-08-09 21:14:01 benchmark.py:165] INFO [4/30] Js: (0.591). Fs: (0.713).
[2021-08-09 21:14:52 benchmark.py:165] INFO [5/30] Js: (0.614). Fs: (0.711).
[2021-08-09 21:15:14 benchmark.py:165] INFO [6/30] Js: (0.627). Fs: (0.716).
[2021-08-09 21:16:28 benchmark.py:165] INFO [7/30] Js: (0.664). Fs: (0.739).
[2021-08-09 21:17:29 benchmark.py:165] INFO [8/30] Js: (0.647). Fs: (0.720).
[2021-08-09 21:18:08 benchmark.py:165] INFO [9/30] Js: (0.656). Fs: (0.726).
[2021-08-09 21:19:13 benchmark.py:165] INFO [10/30] Js: (0.655). Fs: (0.722).
[2021-08-09 21:19:45 benchmark.py:165] INFO [11/30] Js: (0.656). Fs: (0.721).
[2021-08-09 21:20:15 benchmark.py:165] INFO [12/30] Js: (0.647). Fs: (0.708).
[2021-08-09 21:21:18 benchmark.py:165] INFO [13/30] Js: (0.658). Fs: (0.713).
[2021-08-09 21:23:03 benchmark.py:165] INFO [14/30] Js: (0.685). Fs: (0.730).
[2021-08-09 21:23:42 benchmark.py:165] INFO [15/30] Js: (0.689). Fs: (0.740).
[2021-08-09 21:25:04 benchmark.py:165] INFO [16/30] Js: (0.679). Fs: (0.721).
[2021-08-09 21:25:27 benchmark.py:165] INFO [17/30] Js: (0.682). Fs: (0.724).
[2021-08-09 21:26:14 benchmark.py:165] INFO [18/30] Js: (0.665). Fs: (0.711).
[2021-08-09 21:27:14 benchmark.py:165] INFO [19/30] Js: (0.653). Fs: (0.691).
[2021-08-09 21:27:44 benchmark.py:165] INFO [20/30] Js: (0.656). Fs: (0.696).
[2021-08-09 21:28:31 benchmark.py:165] INFO [21/30] Js: (0.656). Fs: (0.695).
[2021-08-09 21:29:37 benchmark.py:165] INFO [22/30] Js: (0.648). Fs: (0.693).
[2021-08-09 21:30:06 benchmark.py:165] INFO [23/30] Js: (0.642). Fs: (0.688).
[2021-08-09 21:31:27 benchmark.py:165] INFO [24/30] Js: (0.630). Fs: (0.681).
[2021-08-09 21:32:38 benchmark.py:165] INFO [25/30] Js: (0.630). Fs: (0.680).
[2021-08-09 21:33:58 benchmark.py:165] INFO [26/30] Js: (0.641). Fs: (0.689).
[2021-08-09 21:34:30 benchmark.py:165] INFO [27/30] Js: (0.642). Fs: (0.688).
[2021-08-09 21:35:29 benchmark.py:165] INFO [28/30] Js: (0.639). Fs: (0.685).
[2021-08-09 21:37:13 benchmark.py:165] INFO [29/30] Js: (0.640). Fs: (0.687).
[2021-08-09 21:37:13 benchmark.py:60] INFO full testing time = 0.47 Hours
Official code:
Evaluating sequences for the semi-supervised task...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [01:35<00:00, 3.17s/it]
Global results saved in ../MAST/momend/benchmark/global_results-val.csv
Per-sequence results saved in ../MAST/momend/benchmark/per-sequence_results-val.csv
--------------------------- Global results for val ---------------------------
J&F-Mean J-Mean J-Recall J-Decay F-Mean F-Recall F-Decay
0.657526 0.63622 0.745465 0.25774 0.678831 0.785443 0.29228
---------- Per sequence results for val ----------
Sequence J-Mean F-Mean
bike-packing_1 0.480788 0.628748
bike-packing_2 0.761695 0.770537
blackswan_1 0.921722 0.963251
bmx-trees_1 0.190787 0.465673
bmx-trees_2 0.595965 0.810131
breakdance_1 0.664590 0.645952
camel_1 0.655145 0.805669
car-roundabout_1 0.775848 0.702136
car-shadow_1 0.831097 0.798872
cows_1 0.893553 0.876756
dance-twirl_1 0.509063 0.568632
dog_1 0.779912 0.807127
dogs-jump_1 0.578673 0.669788
dogs-jump_2 0.584970 0.585592
dogs-jump_3 0.789689 0.858745
drift-chicane_1 0.675905 0.692483
drift-straight_1 0.456805 0.416689
goat_1 0.796213 0.773255
gold-fish_1 0.702340 0.691760
gold-fish_2 0.683785 0.733790
gold-fish_3 0.802630 0.835651
gold-fish_4 0.829149 0.866384
gold-fish_5 0.865846 0.825944
horsejump-high_1 0.767299 0.855796
horsejump-high_2 0.755430 0.947680
india_1 0.621502 0.576030
india_2 0.586131 0.568298
india_3 0.509617 0.497995
judo_1 0.818597 0.841820
judo_2 0.760728 0.777259
kite-surf_1 0.307821 0.298465
kite-surf_2 0.339554 0.494010
kite-surf_3 0.638690 0.842919
lab-coat_1 0.000000 0.000000
lab-coat_2 0.000000 0.000000
lab-coat_3 0.876274 0.760130
lab-coat_4 0.916133 0.878763
lab-coat_5 0.890503 0.862876
libby_1 0.815322 0.946394
loading_1 0.894641 0.813107
loading_2 0.482934 0.581714
loading_3 0.626556 0.637130
mbike-trick_1 0.502991 0.749047
mbike-trick_2 0.516736 0.601366
motocross-jump_1 0.392565 0.479469
motocross-jump_2 0.493480 0.521268
paragliding-launch_1 0.770134 0.823473
paragliding-launch_2 0.659266 0.883304
paragliding-launch_3 0.018357 0.069409
parkour_1 0.652936 0.668643
pigs_1 0.794340 0.771685
pigs_2 0.707950 0.847957
pigs_3 0.879149 0.828674
scooter-black_1 0.658347 0.716700
scooter-black_2 0.646861 0.601611
shooting_1 0.260947 0.258821
shooting_2 0.640829 0.533962
shooting_3 0.812100 0.917409
soapbox_1 0.741423 0.767111
soapbox_2 0.692901 0.763180
soapbox_3 0.534220 0.631662
Total time:95.29174566268921
The results are slightly higher than they reported.
Thank you for your mention. I am not sure if it was caused by different environments but I will handle the problem later.
Hi,
I have evaluated MAMP with the official code and obtained 69.7477 J&F. I will revise the results in the paper after I could return home.
Evaluating sequences for the semi-supervised task...
100%|███████████████████████████████████████████| 30/30 [01:04<00:00, 2.15s/it]
Global results saved in /home/bomiao/Documents/paper_code/MAMP/ckpt/Test_MAMP_mamp704/DAVIS/global_results-val.csv
Per-sequence results saved in /home/bomiao/Documents/paper_code/MAMP/ckpt/Test_MAMP_mamp704/DAVIS/per-sequence_results-val.csv
--------------------------- Global results for val ---------------------------
J&F-Mean J-Mean J-Recall J-Decay F-Mean F-Recall F-Decay
0.697477 0.682889 0.81606 0.19502 0.712064 0.83835 0.221988
---------- Per sequence results for val ----------
Sequence J-Mean F-Mean
bike-packing_1 0.575248 0.730252
bike-packing_2 0.793335 0.812726
blackswan_1 0.951745 0.985871
bmx-trees_1 0.157140 0.491566
bmx-trees_2 0.599169 0.789950
breakdance_1 0.709860 0.685751
camel_1 0.709066 0.803754
car-roundabout_1 0.745022 0.641493
car-shadow_1 0.729525 0.661500
cows_1 0.828064 0.842474
dance-twirl_1 0.642147 0.644053
dog_1 0.850639 0.852469
dogs-jump_1 0.742803 0.772127
dogs-jump_2 0.854538 0.878550
dogs-jump_3 0.902825 0.947254
drift-chicane_1 0.842132 0.911296
drift-straight_1 0.679926 0.476871
goat_1 0.714302 0.628545
gold-fish_1 0.732893 0.689928
gold-fish_2 0.716694 0.764010
gold-fish_3 0.779482 0.783914
gold-fish_4 0.824331 0.859665
gold-fish_5 0.872862 0.844701
horsejump-high_1 0.772836 0.866133
horsejump-high_2 0.728486 0.915676
india_1 0.754207 0.710280
india_2 0.708244 0.704677
india_3 0.681828 0.687948
judo_1 0.861113 0.897679
judo_2 0.811408 0.830144
kite-surf_1 0.336730 0.327505
kite-surf_2 0.270528 0.291336
kite-surf_3 0.754776 0.914193
lab-coat_1 0.007778 0.020690
lab-coat_2 0.000000 0.000000
lab-coat_3 0.948303 0.902281
lab-coat_4 0.922776 0.821939
lab-coat_5 0.909824 0.887201
libby_1 0.758741 0.892876
loading_1 0.882622 0.817000
loading_2 0.492643 0.604770
loading_3 0.650231 0.705151
mbike-trick_1 0.679923 0.766727
mbike-trick_2 0.753820 0.813562
motocross-jump_1 0.464892 0.533696
motocross-jump_2 0.640413 0.649051
paragliding-launch_1 0.833832 0.878521
paragliding-launch_2 0.738824 0.918454
paragliding-launch_3 0.008771 0.033827
parkour_1 0.763694 0.771678
pigs_1 0.824465 0.789646
pigs_2 0.655776 0.770395
pigs_3 0.940607 0.904794
scooter-black_1 0.668694 0.680960
scooter-black_2 0.756713 0.686259
shooting_1 0.245600 0.304995
shooting_2 0.638795 0.514793
shooting_3 0.764462 0.841682
soapbox_1 0.774250 0.792905
soapbox_2 0.742217 0.836193
soapbox_3 0.553686 0.651563
Total time:64.5169906616211
Hi, first thank you for making this work public.
When running the test script of DAVIS, as you stated,
I got the mean J&F=70.4 exactly as described in paper.
But I found that you code is based on MAST, and the output of testing script should be a rough video-wise estimating of performance, not the real one across objects. After evaluated the generated masks on the official repo, I only get mean J&F=69.0.
So I think there may be some mistakes in the evaluating step.