Inference on AVA and JHMDB Needs Maintenance and Necessary Files

DanLuoNEU commented 1 year ago

AVA2.1 inference needs several modifications:

https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/datasets/ava_frame.py#L135

For function loadvideo, the function should be reading images with the video name. video_frame_list = sorted(glob(video_frame_path + vid + '/*.jpg'))
Change the path here for the annotations. https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/evaluates/evaluate_ava.py#L36
The fixes above would get the number listed in the README table. But there would still be a tensorboard error "EOFerror". Add lines after https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/eval_tuber_ava.py#L48

if cfg.DDP_CONFIG.GPU_WORLD_RANK == 0:
        writer.close()

AVA2.2 Inference

per_class [0.49119732        nan 0.32108856 0.58690862 0.1453127  0.25250868
 0.05269343 0.55119903 0.47336599 0.58118356 0.83511073 0.85809156
 0.4264426  0.79215918 0.7533182         nan 0.61339698        nan
        nan 0.04726829        nan 0.16529978        nan 0.23965087
        nan 0.04494236 0.306021   0.55275175 0.36725148 0.07057226
        nan        nan        nan 0.12159738        nan 0.03173127
 0.02196539 0.2641557         nan        nan 0.67544085        nan
 0.00367732        nan 0.01473403 0.03833153 0.03002702 0.37160171
 0.53368705        nan 0.21649021 0.1374056         nan 0.29578147
        nan 0.03978733 0.10253565 0.03219929 0.33915299 0.01752664
 0.28362901 0.3223239  0.14873739 0.52285939 0.14770317 0.11950478
 0.44886859 0.17733113 0.06789831 0.27917222        nan 0.46795067
 0.06238106 0.71983267        nan 0.05018591 0.31590126 0.09531384
 0.8376019  0.70844574]
{'PascalBoxes_Precision/mAP@0.5IOU': 0.30985340450933535, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/bend/bow (at the waist)': 0.4911973183134509, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/crouch/kneel': 0.3210885611841083, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/dance': 0.5869086163647963, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/fall down': 0.14531270272554303, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/get up': 0.25250867821227696, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/jump/leap': 0.05269343043207558, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/lie/sleep': 0.5511990313327797, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/martial art': 0.47336599427812304, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/run/jog': 0.5811835550049768, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/sit': 0.8351107282724392, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/stand': 0.8580915605931295, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/swim': 0.42644259946642094, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/walk': 0.7921591772441756, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/answer phone': 0.7533181965878357, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/carry/hold (an object)': 0.613396976906247, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/climb (e.g., a mountain)': 0.047268291513739374, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/close (e.g., a door, a box)': 0.16529978105316412, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/cut': 0.239650870599096, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/dress/put on clothing': 0.04494235744272522, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/drink': 0.30602100382076136, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/drive (e.g., a car, a truck)': 0.5527517520577403, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/eat': 0.3672514840844659, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/enter': 0.07057225556756908, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/hit (an object)': 0.12159737681929804, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/lift/pick up': 0.03173127096825363, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/listen (e.g., to music)': 0.021965385905557883, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/open (e.g., a window, a car door)': 0.2641556990694153, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/play musical instrument': 0.6754408509957595, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/point to (an object)': 0.0036773150722066972, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/pull (an object)': 0.01473402768023624, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/push (an object)': 0.038331529680086275, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/put down': 0.03002701544153771, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/read': 0.3716017145811048, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/ride (e.g., a bike, a car, a horse)': 0.5336870531261757, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/sail boat': 0.21649020512834088, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/shoot': 0.13740559748226708, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/smoke': 0.2957814682780021, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/take a photo': 0.03978732762876234, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/text on/look at a cellphone': 0.10253564997258985, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/throw': 0.03219929211064902, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/touch (an object)': 0.33915299353156436, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/turn (e.g., a screwdriver)': 0.017526643108955034, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/watch (e.g., TV)': 0.28362901476702795, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/work on a computer': 0.322323903124391, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/write': 0.1487373880589133, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/fight/hit (a person)': 0.5228593870747025, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/give/serve (an object) to (a person)': 0.14770317484649234, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/grab (a person)': 0.11950477963584528, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/hand clap': 0.44886858836133026, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/hand shake': 0.17733112595251085, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/hand wave': 0.06789830556787521, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/hug (a person)': 0.27917221591712854, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/kiss (a person)': 0.4679506698404774, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/lift (a person)': 0.062381058259554645, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/listen to (a person)': 0.7198326661128859, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/push (another person)': 0.050185914377705816, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/sing to (e.g., self, a person, a group)': 0.31590125934914154, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/take (an object) from (a person)': 0.09531383956904724, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/talk to (e.g., self, a person, a group)': 0.8376018955287321, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/watch (a person)': 0.7084457445779531}
mAP: 0.30985

DanLuoNEU commented 1 year ago

For JHMDB Inference

here should be from models.tuber_jhmdb import build_model
modify https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/models/tuber_jhmdb.py#L20 as from models.transformer.transformer import build_transformer
Needs 'JHMDB-GT.pkl' Found the script to download according to the direction in dataset part. Update:
1. nope, that link only has Annotations, Frames and OF, but not with the file above.
2. Get the pickle file from MOC
Provided pretrained DETR model has different embedded query input dimensions as the model built and pretrained JHMDB model

modify the loading detr part according to the built model embed_query input dimensions to avoid this problem https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/utils/model_utils.py#L25

pretrained_dict.update({k: v[:query_size]})
if query_size == model.module.query_embed.weight.shape[0]: continue 
if v.shape[0] < model.module.query_embed.weight.shape[0]: # In case the pretrained model does not align
  query_embed_zeros=torch.zeros(model.module.query_embed.weight.shape)
  pretrained_dict.update({k: query_embed_zeros})
else:
  pretrained_dict.update({k: v[:model.module.query_embed.weight.shape[0]]})

Got different mAP as the table shows

per_class [0.96529908 0.4870422  0.81740977 0.64671594 0.99981187 0.48678173
 0.72522214 0.70157535 0.99132313 0.99332738 0.92539198 0.63780982
 0.6607778  0.89695387 0.78694818 0.42965094 0.26324953 0.94429166
 0.27346689 0.68134081 0.87238637        nan        nan        nan]
{'PascalBoxes_Precision/mAP@0.5IOU': 0.7231798302410739, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Basketball': 0.9652990848728149, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/BasketballDunk': 0.4870421987013735, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Biking': 0.8174097664543525, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/CliffDiving': 0.6467159401389935, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/CricketBowling': 0.9998118686054533, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Diving': 0.48678173366600064, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Fencing': 0.7252221388068574, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/FloorGymnastics': 0.7015753486207187, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/GolfSwing': 0.9913231289322941, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/HorseRiding': 0.9933273801597415, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/IceDancing': 0.9253919821730238, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/LongJump': 0.637809816668955, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/PoleVault': 0.6607777957457814, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/RopeClimbing': 0.8969538737505489, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/SalsaSpin': 0.7869481765834933, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/SkateBoarding': 0.42965094009542815, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Skiing': 0.26324952994810963, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Skijet': 0.9442916605769802, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/SoccerJuggling': 0.27346688938240526, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Surfing': 0.681340807090747, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/TennisSwing': 0.8723863740884812, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/TrampolineJumping': nan, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/VolleyballSpiking': nan, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/WalkingWithDog': nan}
mAP: 0.72318

CKK-coder commented 1 year ago

For JHMDB Inference

here should be from models.tuber_jhmdb import build_model
modify https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/models/tuber_jhmdb.py#L20

as from models.transformer.transformer import build_transformer
Needs 'JHMDB-GT.pkl' Found the script to download according to the direction in dataset part. Update:
1. nope, that link only has Annotations, Frames and OF, but not with the file above.
2. Get the pickle file from MOC
Provided pretrained DETR model has different embedded query input dimensions as the model built and pretrained JHMDB model

modify the loading detr part according to the built model embed_query input dimensions to avoid this problem

https://github.com/amazon-science/tubelet-transformer/blob/f610c97251e5539256095508570563ca2dc8c7a1/utils/model_utils.py#L25

pretrained_dict.update({k: v[:query_size]})
if query_size == model.module.query_embed.weight.shape[0]: continue 
if v.shape[0] < model.module.query_embed.weight.shape[0]: # In case the pretrained model does not align
  query_embed_zeros=torch.zeros(model.module.query_embed.weight.shape)
  pretrained_dict.update({k: query_embed_zeros})
else:
  pretrained_dict.update({k: v[:model.module.query_embed.weight.shape[0]]})

Got different mAP as the table shows

per_class [0.96529908 0.4870422  0.81740977 0.64671594 0.99981187 0.48678173
 0.72522214 0.70157535 0.99132313 0.99332738 0.92539198 0.63780982
 0.6607778  0.89695387 0.78694818 0.42965094 0.26324953 0.94429166
 0.27346689 0.68134081 0.87238637        nan        nan        nan]
{'PascalBoxes_Precision/mAP@0.5IOU': 0.7231798302410739, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Basketball': 0.9652990848728149, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/BasketballDunk': 0.4870421987013735, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Biking': 0.8174097664543525, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/CliffDiving': 0.6467159401389935, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/CricketBowling': 0.9998118686054533, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Diving': 0.48678173366600064, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Fencing': 0.7252221388068574, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/FloorGymnastics': 0.7015753486207187, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/GolfSwing': 0.9913231289322941, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/HorseRiding': 0.9933273801597415, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/IceDancing': 0.9253919821730238, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/LongJump': 0.637809816668955, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/PoleVault': 0.6607777957457814, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/RopeClimbing': 0.8969538737505489, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/SalsaSpin': 0.7869481765834933, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/SkateBoarding': 0.42965094009542815, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Skiing': 0.26324952994810963, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Skijet': 0.9442916605769802, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/SoccerJuggling': 0.27346688938240526, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/Surfing': 0.681340807090747, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/TennisSwing': 0.8723863740884812, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/TrampolineJumping': nan, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/VolleyballSpiking': nan, 'PascalBoxes_PerformanceByCategory/AP@0.5IOU/WalkingWithDog': nan}
mAP: 0.72318

Thank you for your correction.Do you find any code about video map inference. I want to reproduce the video map of UCF101-24.

FransHk commented 1 month ago

Thanks for taking your time to write this, helped me greatly. It's a shame that the codebase for this model is such a mess as-is.

amazon-science / tubelet-transformer

Inference on AVA and JHMDB Needs Maintenance and Necessary Files #14