Closed ypflll closed 9 months ago
I did more test. It seems that the label ID definition is different form willprice/KINETICS_LABELS.md. For example, I tested on five videos which are labeled as 'zumba' (ID 399), the model predictions are: 381 381 381 310 381. ID 381 is "washing hair"
Yes these class label definitions are different. I believe these are the correct definitions for the probe:
K400_CLASS_TEMPLATES = [
'0 weaving_basket',
'1 playing_drums',
'2 catching_or_throwing_softball',
'3 riding_unicycle',
'4 robot_dancing',
'5 eating_cake',
'6 cleaning_toilet',
'7 biking_through_snow',
'8 bee_keeping',
'9 playing_keyboard',
'10 skiing_slalom',
'11 balloon_blowing',
'12 feeding_birds',
'13 trimming_or_shaving_beard',
'14 playing_trombone',
'15 parasailing',
'16 exercising_with_an_exercise_ball',
'17 massaging_feet',
'18 bending_back',
'19 smoking_hookah',
'20 salsa_dancing',
'21 hopscotch',
'22 windsurfing',
'23 testifying',
'24 washing_feet',
'25 playing_clarinet',
'26 golf_putting',
'27 washing_hair',
'28 swimming_breast_stroke',
'29 pushing_car',
'30 riding_mountain_bike',
'31 playing_chess',
'32 vault',
'33 cooking_on_campfire',
'34 catching_fish',
'35 fixing_hair',
'36 texting',
'37 skipping_rope',
'38 changing_oil',
'39 brushing_teeth',
'40 pushing_cart',
'41 eating_watermelon',
'42 kicking_field_goal',
'43 playing_poker',
'44 training_dog',
'45 making_jewelry',
'46 springboard_diving',
'47 playing_bass_guitar',
'48 cutting_nails',
'49 making_bed',
'50 driving_car',
'51 catching_or_throwing_frisbee',
'52 petting_cat',
'53 cleaning_pool',
'54 tossing_salad',
'55 sled_dog_racing',
'56 cleaning_gutters',
'57 slapping',
'58 swing_dancing',
'59 making_a_sandwich',
'60 taking_a_shower',
'61 cleaning_shoes',
'62 digging',
'63 eating_carrots',
'64 hitting_baseball',
'65 using_computer',
'66 playing_didgeridoo',
'67 surfing_water',
'68 headbutting',
'69 getting_a_tattoo',
'70 juggling_fire',
'71 tobogganing',
'72 playing_saxophone',
'73 beatboxing',
'74 tickling',
'75 shredding_paper',
'76 drop_kicking',
'77 riding_a_bike',
'78 triple_jump',
'79 cheerleading',
'80 eating_spaghetti',
'81 mopping_floor',
'82 scuba_diving',
'83 capoeira',
'84 swimming_butterfly_stroke',
'85 using_remote_controller_(not_gaming)',
'86 throwing_ball',
'87 riding_mule',
'88 feeding_fish',
'89 dying_hair',
'90 grooming_dog',
'91 kissing',
'92 snowboarding',
'93 hurling_(sport)',
'94 juggling_balls',
'95 hula_hooping',
'96 snorkeling',
'97 playing_squash_or_racquetball',
'98 filling_eyebrows',
'99 arranging_flowers',
'100 sanding_floor',
'101 playing_cello',
'102 sweeping_floor',
'103 waiting_in_line',
'104 feeding_goats',
'105 dribbling_basketball',
'106 tying_tie',
'107 assembling_computer',
'108 headbanging',
'109 doing_laundry',
'110 snowmobiling',
'111 hugging',
'112 running_on_treadmill',
'113 tasting_beer',
'114 spraying',
'115 playing_harmonica',
'116 petting_animal_(not_cat)',
'117 slacklining',
'118 pumping_fist',
'119 watering_plants',
'120 push_up',
'121 massaging_legs',
'122 making_pizza',
'123 cleaning_floor',
'124 marching',
'125 peeling_potatoes',
'126 unloading_truck',
'127 climbing_a_rope',
'128 mowing_lawn',
'129 climbing_tree',
'130 counting_money',
'131 busking',
'132 making_sushi',
'133 eating_chips',
'134 making_a_cake',
'135 playing_trumpet',
'136 rock_scissors_paper',
'137 flying_kite',
'138 giving_or_receiving_award',
'139 high_kick',
'140 cracking_neck',
'141 waxing_chest',
'142 ice_skating',
'143 singing',
'144 doing_nails',
'145 bowling',
'146 faceplanting',
'147 skiing_crosscountry',
'148 cutting_watermelon',
'149 playing_recorder',
'150 cleaning_windows',
'151 answering_questions',
'152 stretching_leg',
'153 shearing_sheep',
'154 breading_or_breadcrumbing',
'155 massaging_back',
'156 planting_trees',
'157 bartending',
'158 stomping_grapes',
'159 gargling',
'160 folding_napkins',
'161 breakdancing',
'162 bench_pressing',
'163 situp',
'164 celebrating',
'165 playing_cricket',
'166 auctioning',
'167 squat',
'168 bandaging',
'169 writing',
'170 dancing_macarena',
'171 dining',
'172 playing_volleyball',
'173 archery',
'174 hoverboarding',
'175 ice_fishing',
'176 bending_metal',
'177 playing_paintball',
'178 parkour',
'179 tasting_food',
'180 swinging_legs',
'181 riding_scooter',
'182 canoeing_or_kayaking',
'183 welding',
'184 applying_cream',
'185 yoga',
'186 throwing_axe',
'187 eating_burger',
'188 frying_vegetables',
'189 playing_ice_hockey',
'190 opening_bottle',
'191 skateboarding',
'192 dancing_charleston',
'193 cartwheeling',
'194 ironing',
'195 deadlifting',
'196 blowing_out_candles',
'197 playing_cymbals',
'198 abseiling',
'199 bookbinding',
'200 throwing_discus',
'201 sticking_tongue_out',
'202 water_sliding',
'203 eating_ice_cream',
'204 grinding_meat',
'205 blasting_sand',
'206 making_snowman',
'207 making_tea',
'208 finger_snapping',
'209 wrestling',
'210 snowkiting',
'211 rock_climbing',
'212 dunking_basketball',
'213 tapping_pen',
'214 shaking_head',
'215 peeling_apples',
'216 holding_snake',
'217 playing_bagpipes',
'218 eating_doughnuts',
'219 smoking',
'220 washing_hands',
'221 curling_hair',
'222 shoveling_snow',
'223 playing_organ',
'224 waxing_eyebrows',
'225 checking_tires',
'226 bouncing_on_trampoline',
'227 clapping',
'228 chopping_wood',
'229 tying_knot_(not_on_a_tie)',
'230 surfing_crowd',
'231 tying_bow_tie',
'232 sharpening_knives',
'233 tapping_guitar',
'234 driving_tractor',
'235 playing_kickball',
'236 strumming_guitar',
'237 riding_camel',
'238 kicking_soccer_ball',
'239 playing_cards',
'240 blowing_nose',
'241 juggling_soccer_ball',
'242 presenting_weather_forecast',
'243 whistling',
'244 punching_person_(boxing)',
'245 braiding_hair',
'246 dancing_gangnam_style',
'247 clay_pottery_making',
'248 baking_cookies',
'249 pull_ups',
'250 building_shed',
'251 moving_furniture',
'252 playing_monopoly',
'253 drinking_shots',
'254 egg_hunting',
'255 jumpstyle_dancing',
'256 contact_juggling',
'257 milking_cow',
'258 barbequing',
'259 tai_chi',
'260 building_cabinet',
'261 playing_xylophone',
'262 blowing_glass',
'263 climbing_ladder',
'264 drumming_fingers',
'265 paragliding',
'266 shooting_goal_(soccer)',
'267 changing_wheel',
'268 brush_painting',
'269 playing_tennis',
'270 arm_wrestling',
'271 using_segway',
'272 decorating_the_christmas_tree',
'273 sign_language_interpreting',
'274 roller_skating',
'275 playing_basketball',
'276 news_anchoring',
'277 cooking_sausages',
'278 cutting_pineapple',
'279 pumping_gas',
'280 pushing_wheelchair',
'281 extinguishing_fire',
'282 water_skiing',
'283 bobsledding',
'284 sneezing',
'285 lunge',
'286 walking_the_dog',
'287 swimming_backstroke',
'288 shaving_legs',
'289 shining_shoes',
'290 tossing_coin',
'291 sniffing',
'292 hurdling',
'293 setting_table',
'294 jogging',
'295 swinging_on_something',
'296 javelin_throw',
'297 high_jump',
'298 golf_chipping',
'299 reading_newspaper',
'300 somersaulting',
'301 tap_dancing',
'302 unboxing',
'303 flipping_pancake',
'304 sailing',
'305 doing_aerobics',
'306 playing_flute',
'307 belly_dancing',
'308 dodgeball',
'309 laughing',
'310 krumping',
'311 skydiving',
'312 playing_guitar',
'313 sharpening_pencil',
'314 wrapping_present',
'315 carving_pumpkin',
'316 clean_and_jerk',
'317 side_kick',
'318 hammer_throw',
'319 golf_driving',
'320 folding_clothes',
'321 crawling_baby',
'322 passing_American_football_(not_in_game)',
'323 bungee_jumping',
'324 riding_mechanical_bull',
'325 air_drumming',
'326 reading_book',
'327 massaging_persons_head',
'328 drinking_beer',
'329 scrambling_eggs',
'330 folding_paper',
'331 playing_controller',
'332 hockey_stop',
'333 getting_a_haircut',
'334 riding_elephant',
'335 front_raises',
'336 pole_vault',
'337 crossing_river',
'338 picking_fruit',
'339 blowing_leaves',
'340 gymnastics_tumbling',
'341 shuffling_cards',
'342 eating_hotdog',
'343 crying',
'344 jetskiing',
'345 diving_cliff',
'346 laying_bricks',
'347 ski_jumping',
'348 drinking',
'349 riding_or_walking_with_horse',
'350 passing_American_football_(in_game)',
'351 skiing_(not_slalom_or_crosscountry)',
'352 playing_badminton',
'353 trimming_trees',
'354 exercising_arm',
'355 yawning',
'356 cooking_egg',
'357 kitesurfing',
'358 washing_dishes',
'359 shot_put',
'360 garbage_collecting',
'361 grooming_horse',
'362 playing_harp',
'363 jumping_into_pool',
'364 drawing',
'365 dancing_ballet',
'366 shaving_head',
'367 opening_present',
'368 catching_or_throwing_baseball',
'369 recording_music',
'370 spray_painting',
'371 knitting',
'372 stretching_arm',
'373 snatch_weight_lifting',
'374 carrying_baby',
'375 playing_ukulele',
'376 punching_bag',
'377 shooting_basketball',
'378 spinning_poi',
'379 waxing_legs',
'380 long_jump',
'381 zumba',
'382 playing_piano',
'383 playing_accordion',
'384 shaking_hands',
'385 applauding',
'386 motorcycling',
'387 disc_golfing',
'388 baby_waking_up',
'389 trapezing',
'390 plastering',
'391 cooking_chicken',
'392 tango_dancing',
'393 brushing_hair',
'394 waxing_back',
'395 playing_violin',
'396 ripping_paper',
'397 country_line_dancing',
'398 sword_fighting',
'399 ice_climbing',
]
Thank you MidoAssran. The encoder dicts should be updated to include label references as I had the same issue with mismatched labelling. Inference is working perfectly now.
@ypflll How did you modify “jepa/evals/video_classificationunfrozen/eval. py”? I used the same weight but produced incorrect prediction results
@uniquezhengjie I'll share a script later
@uniquezhengjie I'll share a script later
Thank you, looking forward to your reply
pred.zip @uniquezhengjie
I tested using the script you provided and obtained the following results: Model & config: Encoder: vith16-384.pth.tar Classifier: vith16-384-k400-probe.pth.tar Config: vith16_384-k400_16x8x3 Example data, first 5 videos of k400 val-set, label is "abseiling": 0wR5jVB-WPk_000417_000427.mp4 3caPS4FHFF8_000036_000046.mp4 3yaoNwz99xM_000062_000072.mp4 6IbvOJxXnOo_000047_000057.mp4 6_4kjPiQr7w_000191_000201.mp4 Resut: Index: 0 , Predict: 257 Index: 1 , Predict: 268 Index: 2 , Predict: 288 Index: 3 , Predict: 11 Index: 4 , Predict: 153
Resut:
Index: 0 , Predict: 198
Index: 1 , Predict: 198
Index: 2 , Predict: 198
Index: 3 , Predict: 211
Index: 4 , Predict: 198
Seems that the results are random. I wonder if the pretrained models are correctly loaded.
Hi @uniquezhengjie , have you figured out why the model outputs random predictions? I also got some results that do not make sense. Is there any trick to load the pretrained model and attentive probe?
I also got random predictions with the code on main
. Reverting https://github.com/facebookresearch/jepa/commit/787b04ae6c573be587d6afccea5eca6b9fde9039 fixed the issue for me. I'd guess this linear layer https://github.com/facebookresearch/jepa/blob/787b04ae6c573be587d6afccea5eca6b9fde9039/src/models/utils/modules.py#L137 was never used when the probe is trained.
Thank you very much to share your great work! I tried to reproduce the video recognition results but get very low accuracy. Can you give me some advices if I missed something? or kindly provide a script which can get Acc in the Table?
I tested the model based on this script: jepa/evals/video_classification_frozen/eval.py, and removed code related to training. Model & config:
Encoder: vith16.pth.tar
Classifier: vith16-k400-probe.pth.tar
Config: vith16_k400_16x8x3 Example data, first 5 videos of k400 val-set, label is "abseiling":
0wR5jVB-WPk_000417_000427.mp4
3caPS4FHFF8_000036_000046.mp4
3yaoNwz99xM_000062_000072.mp4
6IbvOJxXnOo_000047_000057.mp4
6_4kjPiQr7w_000191_000201.mp4 Resut:
Index: 0 , Predict: 198
Index: 1 , Predict: 198
Index: 2 , Predict: 198
Index: 3 , Predict: 211
Index: 4 , Predict: 198
Label "abseiling" should be 0, accoring to willprice/KINETICS_LABELS.md So the predictions are all wrong?