Reproducing DAVIS 2017 Validation Results

m43 commented 1 year ago

Thanks for the nice work! I am having trouble reproducing the 71.9 mean $\mathcal{J}\&\mathcal{F}$ result reported for PerSAM-F on the semi-supervised video object segmentation task on the DAVIS 2017 validation subset in Table 2. What hyperparameters should be used? What per-scene results should be expected?

m43 commented 1 year ago

The best result I got is 59.7, with a topk of 2, a finetuning learning rate of 4e-3, and 2000 finetuning epochs. This gives, for example, the predicted masks and per-scene results below.	Sequence	J-Mean
bike-packing_1	0.6399064035270406	0.6241200522856292
bike-packing_2	0.8667416520867692	0.8497188456668301
blackswan_1	0.9516698203640184	0.969898338047989
bmx-trees_1	0.20352158490853473	0.465861688539066
bmx-trees_2	0.7664777627175673	0.8938505844361433
breakdance_1	0.9152909804065635	0.943285426906372
camel_1	0.9776202249636236	0.9906858461475901
car-roundabout_1	0.9497893556280378	0.9403696351951556
car-shadow_1	0.9283267451962578	0.957482035987094
cows_1	0.9624527211240835	0.9726087073767071
dance-twirl_1	0.890103309394986	0.8917722900285995
dog_1	0.9638740725435088	0.9856044300462039
dogs-jump_1	0.32951978161344875	0.507563871259939
dogs-jump_2	0.22162237410268776	0.2413872123849592
dogs-jump_3	0.9529795129571599	0.9892650870325214
drift-chicane_1	0.8470769707651081	0.90579919046004
drift-straight_1	0.7588246426360858	0.770296007149966
goat_1	0.9248254707678853	0.9546408434012638
gold-fish_1	0.583355884650908	0.584923006030584
gold-fish_2	0.43872421912712223	0.4792727344374414
gold-fish_3	0.45549363707338214	0.4654088999613936
gold-fish_4	0.8191387079417671	0.8793214491937217
gold-fish_5	0.7275326942435778	0.6764959665878534
horsejump-high_1	0.7839485647077783	0.8851922660540791
horsejump-high_2	0.827871067571159	0.9218180503802359
india_1	0.47573090052413525	0.4958476134790285
india_2	0.07469925630742402	0.11236208776118921
india_3	0.16422235353697984	0.2303529508531234
judo_1	0.706599003950618	0.813309299423999
judo_2	0.26105820311699635	0.31823222119287303
kite-surf_1	0.07553332424826725	0.2472583450543017
kite-surf_2	0.26225473884919664	0.4437874561623833
kite-surf_3	0.7296526090543175	0.9263191237624829
lab-coat_1	0.021441803017182404	0.23832778355659331
lab-coat_2	0	0
lab-coat_3	0.7290292927018328	0.6646606163991821
lab-coat_4	0.5420454299961217	0.5584672336962134
lab-coat_5	0.11566918400099466	0.1797906055259418
libby_1	0.9065775051356089	0.9678424917167289
loading_1	0.7199033779678851	0.732701631116802
loading_2	0.19615814671285867	0.2710120163281799
loading_3	0.06663060708528533	0.0978663900073181
mbike-trick_1	0.7416047909979043	0.8083034335640074
mbike-trick_2	0.6157892235327584	0.6782859574905682
motocross-jump_1	0.7742488479778837	0.7967092765613831
motocross-jump_2	0.7036048254746714	0.6141995226404211
paragliding-launch_1	0.4587810753759272	0.5897981648860823
paragliding-launch_2	0.4014951939461899	0.6576252860452271
paragliding-launch_3	0.08734684813736891	0.3043779602008625
parkour_1	0.9298545967908415	0.9474569010791223
pigs_1	0.5018181361957516	0.6702886749807105
pigs_2	0.404052123244028	0.6085168970597492
pigs_3	0.8843122262655969	0.8842419865726584
scooter-black_1	0.06581748644121999	0.08110654104438361
scooter-black_2	0.396784845010354	0.42920183276557317
shooting_1	0.6328113362608301	0.6366941703094167
shooting_2	0.683726855472171	0.6854570653258045
shooting_3	0.8926766781358134	0.9687601763773861
soapbox_1	0.5642825851351866	0.6285686277101605
soapbox_2	0.09202095590336416	0.12124133601504176
soapbox_3	0.062035644450239597	0.08169302538331341

m43 commented 1 year ago

However, I do get a close number when evaluating on the DAVIS 2016 (not 2017) validation subset and with hyperparameters suggested in the paper (topk=2, lr=4e-4, epochs=800), wondering if this is a coincidence	Method	JF_mean	J_mean	J_recall	J_decay	F_mean	F_recall	F_decay
eval_D16_val	0.712	0.701	0.767	0.086	0.723	0.758	0.077

ZrrSkywalker / Personalize-SAM

Reproducing DAVIS 2017 Validation Results #31