ZrrSkywalker / Personalize-SAM

Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
MIT License
1.49k stars 99 forks source link

Reproducing DAVIS 2017 Validation Results #31

Open m43 opened 1 year ago

m43 commented 1 year ago

Thanks for the nice work! I am having trouble reproducing the 71.9 mean $\mathcal{J}\&\mathcal{F}$ result reported for PerSAM-F on the semi-supervised video object segmentation task on the DAVIS 2017 validation subset in Table 2. What hyperparameters should be used? What per-scene results should be expected?

m43 commented 1 year ago
The best result I got is 59.7, with a topk of 2, a finetuning learning rate of 4e-3, and 2000 finetuning epochs. This gives, for example, the predicted masks and per-scene results below. predictions-only_0_86e6bff9f36c07bdf026 predictions-only_1_d3e8a29bd6748ddcfc3c predictions-only_2_45d1bfd6b6b9e262a0f4 Sequence J-Mean F-Mean
bike-packing_1 0.6399064035270406 0.6241200522856292
bike-packing_2 0.8667416520867692 0.8497188456668301
blackswan_1 0.9516698203640184 0.969898338047989
bmx-trees_1 0.20352158490853473 0.465861688539066
bmx-trees_2 0.7664777627175673 0.8938505844361433
breakdance_1 0.9152909804065635 0.943285426906372
camel_1 0.9776202249636236 0.9906858461475901
car-roundabout_1 0.9497893556280378 0.9403696351951556
car-shadow_1 0.9283267451962578 0.957482035987094
cows_1 0.9624527211240835 0.9726087073767071
dance-twirl_1 0.890103309394986 0.8917722900285995
dog_1 0.9638740725435088 0.9856044300462039
dogs-jump_1 0.32951978161344875 0.507563871259939
dogs-jump_2 0.22162237410268776 0.2413872123849592
dogs-jump_3 0.9529795129571599 0.9892650870325214
drift-chicane_1 0.8470769707651081 0.90579919046004
drift-straight_1 0.7588246426360858 0.770296007149966
goat_1 0.9248254707678853 0.9546408434012638
gold-fish_1 0.583355884650908 0.584923006030584
gold-fish_2 0.43872421912712223 0.4792727344374414
gold-fish_3 0.45549363707338214 0.4654088999613936
gold-fish_4 0.8191387079417671 0.8793214491937217
gold-fish_5 0.7275326942435778 0.6764959665878534
horsejump-high_1 0.7839485647077783 0.8851922660540791
horsejump-high_2 0.827871067571159 0.9218180503802359
india_1 0.47573090052413525 0.4958476134790285
india_2 0.07469925630742402 0.11236208776118921
india_3 0.16422235353697984 0.2303529508531234
judo_1 0.706599003950618 0.813309299423999
judo_2 0.26105820311699635 0.31823222119287303
kite-surf_1 0.07553332424826725 0.2472583450543017
kite-surf_2 0.26225473884919664 0.4437874561623833
kite-surf_3 0.7296526090543175 0.9263191237624829
lab-coat_1 0.021441803017182404 0.23832778355659331
lab-coat_2 0 0
lab-coat_3 0.7290292927018328 0.6646606163991821
lab-coat_4 0.5420454299961217 0.5584672336962134
lab-coat_5 0.11566918400099466 0.1797906055259418
libby_1 0.9065775051356089 0.9678424917167289
loading_1 0.7199033779678851 0.732701631116802
loading_2 0.19615814671285867 0.2710120163281799
loading_3 0.06663060708528533 0.0978663900073181
mbike-trick_1 0.7416047909979043 0.8083034335640074
mbike-trick_2 0.6157892235327584 0.6782859574905682
motocross-jump_1 0.7742488479778837 0.7967092765613831
motocross-jump_2 0.7036048254746714 0.6141995226404211
paragliding-launch_1 0.4587810753759272 0.5897981648860823
paragliding-launch_2 0.4014951939461899 0.6576252860452271
paragliding-launch_3 0.08734684813736891 0.3043779602008625
parkour_1 0.9298545967908415 0.9474569010791223
pigs_1 0.5018181361957516 0.6702886749807105
pigs_2 0.404052123244028 0.6085168970597492
pigs_3 0.8843122262655969 0.8842419865726584
scooter-black_1 0.06581748644121999 0.08110654104438361
scooter-black_2 0.396784845010354 0.42920183276557317
shooting_1 0.6328113362608301 0.6366941703094167
shooting_2 0.683726855472171 0.6854570653258045
shooting_3 0.8926766781358134 0.9687601763773861
soapbox_1 0.5642825851351866 0.6285686277101605
soapbox_2 0.09202095590336416 0.12124133601504176
soapbox_3 0.062035644450239597 0.08169302538331341
m43 commented 1 year ago
However, I do get a close number when evaluating on the DAVIS 2016 (not 2017) validation subset and with hyperparameters suggested in the paper (topk=2, lr=4e-4, epochs=800), wondering if this is a coincidence Method JF_mean J_mean J_recall J_decay F_mean F_recall F_decay
eval_D16_val 0.712 0.701 0.767 0.086 0.723 0.758 0.077