How to reproduce video recognition Acc in the Table?

Thank you very much to share your great work! I tried to reproduce the video recognition results but get very low accuracy. Can you give me some advices if I missed something? or kindly provide a script which can get Acc in the Table? I tested the model based on this script: jepa/evals/video_classification_frozen/eval.py Configs:vitl16_ssv2_16x2x3.yaml nodes: 8 tasks_per_node: 8 tag: ssv2-16x2x3 eval_name: video_classification_frozen resume_checkpoint: false data: dataset_train: xx/ssv2_train.csv dataset_val: xx/ssv2_val.csv dataset_type: VideoDataset num_classes: 174 frames_per_clip: 16 num_segments: 1 #2 num_views_per_segment: 3 frame_step: 4 optimization: attend_across_segments: true num_epochs: 20 resolution: 224 batch_size: 16 #4 weight_decay: 0.01 lr: 0.001 start_lr: 0.001 final_lr: 0.0 warmup: 0. use_bfloat16: true pretrain: model_name: vit_large checkpoint_key: target_encoder clip_duration: null frames_per_clip: 16 tubelet_size: 2 uniform_power: true use_silu: false tight_silu: false use_sdpa: true patch_size: 16 folder: xx/JEPA checkpoint: vitl16.pth write_tag: jepa

facebookresearch / jepa

How to reproduce video recognition Acc in the Table? #67