Closed lorenmt closed 3 years ago
Hi @lorenmt,
The sequence of commands are provided in https://github.com/ajabri/videowalk#davis. After you run test.py
, you need to run the following commands, the latter of which calls the official davis2017-evaluation
repository:
# Convert
python eval/convert_davis.py --in_folder /save/path/ --out_folder /converted/path --dataset /davis/path/
# Compute metrics
python /path/to/davis2017-evaluation/evaluation_method.py \
--task semi-supervised --results_path /converted/path --set val \
--davis_path /path/to/davis/
Is this what you did to obtain your results above?
Also, IIRC, the davis-2017
and davis2017-evaluation
repositories expect the inference output file names to be indexed differently (0 index v.s. 1 index).
So, if you use davis-2017
, you should change line 63 in my convert_davis.py
script code from 'j' to 'j+1'.
Hi Allen,
Thanks for your quick reply. I revised the raw results with own code below:
file_list = os.path.join('dataset/DAVIS', 'ImageSets', '2017/val.txt')
videos = []
with open(os.path.join(file_list), "r") as frame_set:
for f in frame_set:
_video = f.rstrip('\n')
videos.append(_video)
palette = np.loadtxt('palette.txt', dtype=np.uint8).reshape(-1, 3)
for i in range(30):
a = glob.glob('dataset/davis_corr/{}_*_mask.png'.format(i))
create_folder('dataset/davis_videowalk/{}'.format(videos[i]))
for k in range(len(a)):
im = Image.open('dataset/davis_corr/{}_{}_mask.png'.format(i, k))
im = np.array(im)
label = np.unique(im.reshape(-1, 3), axis=0)
im_ = np.zeros((im.shape[0], im.shape[1]), dtype=np.uint8)
for kk in range(len(label)):
mask = np.float16(label[kk] == im)
mask = mask.sum(-1) == 3
im_[mask] = kk
im = Image.fromarray(im_)
im.putpalette(palette.ravel())
im.save('dataset/davis_videowalk/{}/{:05d}.png'.format(videos[i], k))
That would reorganize the raw results into the davis format. And here is what I obtained which you can download here: https://www.dropbox.com/sh/1cr85dyxeeptk0k/AACBoXYIo2noUMFWBHZ8HD0-a?dl=0
And finally I ran python evaluation_method.py --task semi-supervised --results_path ../../dataset/davis_reco
which produces the performance I provided in the first comment.
From the visual results I obtained, it actually indeed look worse compared to the results you put in the video. But I would be really grateful if you could futher check into this, and to see whether the generated results are ok?
Best,
Additional note: I am quite confident that the evaluation script is correct. Since I used the same script evaluating STM, and I got the same reported performance.
Futher note: I am sorry that I found the lab-coat index is wrong (after 8x down-sampling, two objects are completely disappeared). I will fix the issue and rerun the script and update the result here.
Hello, here are the updated result:
--------------------------- Global results for val ---------------------------
J&F-Mean J-Mean J-Recall J-Decay F-Mean F-Recall F-Decay
0.657716 0.62911 0.735837 0.223777 0.686321 0.812783 0.269499
---------- Per sequence results for val ----------
Sequence J-Mean F-Mean
bike-packing_1 0.496049 0.711096
bike-packing_2 0.685996 0.752332
blackswan_1 0.934492 0.973339
bmx-trees_1 0.301675 0.770057
bmx-trees_2 0.644392 0.845591
breakdance_1 0.666383 0.676260
camel_1 0.747073 0.855923
car-roundabout_1 0.852337 0.714172
car-shadow_1 0.807822 0.778809
cows_1 0.920527 0.956957
dance-twirl_1 0.549648 0.593753
dog_1 0.851405 0.867017
dogs-jump_1 0.302670 0.435166
dogs-jump_2 0.536664 0.599638
dogs-jump_3 0.788082 0.822245
drift-chicane_1 0.729466 0.786235
drift-straight_1 0.526541 0.528944
goat_1 0.800556 0.734920
gold-fish_1 0.721810 0.717445
gold-fish_2 0.659471 0.700005
gold-fish_3 0.820182 0.845394
gold-fish_4 0.848312 0.915238
gold-fish_5 0.879084 0.878996
horsejump-high_1 0.773536 0.888244
horsejump-high_2 0.723407 0.944909
india_1 0.631993 0.592968
india_2 0.567645 0.560544
india_3 0.629983 0.627841
judo_1 0.760509 0.765048
judo_2 0.749010 0.756075
kite-surf_1 0.270090 0.267305
kite-surf_2 0.004306 0.062131
kite-surf_3 0.093566 0.127047
lab-coat_1 0.000000 0.000000
lab-coat_2 0.000000 0.000000
lab-coat_3 0.932124 0.895322
lab-coat_4 0.914726 0.837048
lab-coat_5 0.866172 0.835881
libby_1 0.803691 0.920149
loading_1 0.900133 0.875399
loading_2 0.383891 0.567959
loading_3 0.682442 0.716217
mbike-trick_1 0.571612 0.743456
mbike-trick_2 0.639744 0.669962
motocross-jump_1 0.340788 0.395740
motocross-jump_2 0.519756 0.554731
paragliding-launch_1 0.819913 0.923513
paragliding-launch_2 0.645564 0.885479
paragliding-launch_3 0.034370 0.137811
parkour_1 0.805982 0.893970
pigs_1 0.812613 0.764461
pigs_2 0.617975 0.750136
pigs_3 0.906452 0.882834
scooter-black_1 0.389385 0.669319
scooter-black_2 0.722495 0.675855
shooting_1 0.270579 0.454346
shooting_2 0.747166 0.661882
shooting_3 0.753406 0.872043
soapbox_1 0.785921 0.778360
soapbox_2 0.647941 0.710407
soapbox_3 0.586195 0.741657
Now the results look similar to your reported one, 2 percentage lower in J&F-Mean. So is this pre-trained performance is without online adaptation. And if with online adaptation, are we expected to reach 67 J&F-Mean? Thanks!
No, the result reported (67.6 J&F-Mean
) is without online adaptation. So there still seems to be a gap...
I am not sure where it is coming from but will have a chance to investigate in the next few days; the only difference seems to be using your raw conversion code v.s. the code I've provided in convert_davis.py
.
Hi Allan,
I think I found the mistake. Again it's from the object index error, when some frames only predict partial objects, the indices are not mapped correctly. I did a sanity check to reevaluate with your convert_davis.py
code, and I got 67.4
for J&F-mean, and it's within a reasonable range for uncertainty. Really sorry from my mistake and thank for your time and reply.
Hi,
I recently run your pre-trained model on davis 2017 with the exact same command you listed in the readme.
python test.py --filelist /path/to/davis/vallist.txt \ --model-type scratch --resume ../pretrained.pth --save-path /save/path \ --topk 10 --videoLen 20 --radius 12 --temperature 0.05 --cropSize -1
However, the final performance based on the official davis evaluation script is not as good as the one claimed in the paper. What I got is around 61 for J&F-Mean. Specifically, the detailed performance is listed as below:
I am wondering whether this is the expected performance without test time adaptation? Or could you list the detailed step-by-step procedure so we can reproduce the results more easily?
Thanks.