Dawn-LX / OpenVoc-VidVRD

Official code for the ICLR2023 paper Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
37 stars 1 forks source link

Official code for our ICLR2023 paper: "Compositional Prompt Tuning with Motion Cues for Open-Vocabulary Video Relation Detection" openreview link ICLR2023_poster-2700x1806.jpg

[Update] train & eval code for VidVRD dataset is ready

[Update] All traj data for VidOR has been released

Data-release summarize

Actually, the raw video data (.mp4 files) is not required to run this repo. We provide the pre-prepared traj data (include bbox and features)

Overview: There are 3 types of data

For each type of the above data, it includes gt and det, i.e., ground-truth traj bboxes and detection traj bboxes, with their features/embds. (certainly, we don't need Seq-NMS to perform tracking for gt)

Code for prepare the above traj data

Please refer to this repo VidSGG-TrajDataPrepare for how to prepare the above traj data.

VidVRD

Pre-prepared traj data (MEGA cloud link)

In detail, there are the following files: (where data0/ refers to /home/gkf/project/)

Download the above data and format as, e.g.,

data0/
|   ALPRO/-------------------------------------------------------------------------------------------------------------(num_folders:1, num_files=0),num_videos=0
|   |   extract_features_output/---------------------------------------------------------------------------------------(num_folders:3, num_files=1),num_videos=0
|   |   |   VidVRDtest_seg30_TrajFeatures256_gt/------------------------------------------------------------------(num_folders:0, num_files=2884),num_videos=200
|   |   |   vidvrd_seg30_TrajFeatures256/-----------------------------------------------------------------------(num_folders:0, num_files=18348),num_videos=1000
|   |   |   vidvrd_seg30_TrajFeatures256_gt/----------------------------------------------------------------------(num_folders:0, num_files=5855),num_videos=800
|   scene_graph_benchmark/---------------------------------------------------------------------------------------------(num_folders:1, num_files=0),num_videos=0
|   |   output/--------------------------------------------------------------------------------------------------------(num_folders:6, num_files=0),num_videos=0
|   |   |   VidVRD_gt_traj_features_seg30/------------------------------------------------------------------------(num_folders:0, num_files=5855),num_videos=800
|   |   |   VidVRD_traj_features_seg30_th-15-5/-----------------------------------------------------------------(num_folders:0, num_files=18348),num_videos=1000
|   |   |   VidVRD_traj_features_seg30/-------------------------------------------------------------------------(num_folders:0, num_files=18348),num_videos=1000
|   |   |   VidVRDtest_gt_traj_features_seg30/--------------------------------------------------------------------(num_folders:0, num_files=2884),num_videos=200
|   |   |   VidVRDtest_tracking_results_gt/-----------------------------------------------------------------------(num_folders:0, num_files=2884),num_videos=200
|   |   |   VidVRD_tracking_results_gt/---------------------------------------------------------------------------(num_folders:0, num_files=5855),num_videos=800
|   VidVRD-II/---------------------------------------------------------------------------------------------------------(num_folders:1, num_files=0),num_videos=0
|   |   tracklets_results/---------------------------------------------------------------------------------------------(num_folders:2, num_files=0),num_videos=0
|   |   |   VidVRD_segment30_tracking_results_th-15-5/----------------------------------------------------------(num_folders:0, num_files=18348),num_videos=1000
|   |   |   VidVRD_segment30_tracking_results/------------------------------------------------------------------(num_folders:0, num_files=18348),num_videos=1000
|   VidVRD_VidOR/------------------------------------------------------------------------------------------------------(num_folders:2, num_files=0),num_videos=0
|   |   vidvrd-dataset/------------------------------------------------------------------------------------------------(num_folders:2, num_files=0),num_videos=0
|   |   |   train/-------------------------------------------------------------------------------------------------(num_folders:0, num_files=800),num_videos=800
|   |   |   test/--------------------------------------------------------------------------------------------------(num_folders:0, num_files=200),num_videos=200
|   |   vidor-dataset/-------------------------------------------------------------------------------------------------(num_folders:0, num_files=0),num_videos=0

Model weights

VidOR

We backup the video data here in case the official link not work.

Pre-prepared traj data (MEGA cloud link). It contains the following files:

Model Weights:

Trajectory Classification Module

First add the env path: export PYTHONPATH=$PYTHONPATH:"/your/path/OpenVoc-VidVRD/"

Train

refer to the commands in tools/train_traj_cls_both.py, for both VidVRD & VidOR datasets, e.g.,

    CUDA_VISIBLE_DEVICES=3 python tools/train_traj_cls_both.py \
        --dataset_class VidVRDTrajDataset \
        --model_class OpenVocTrajCls_NoBgEmb \
        --cfg_path experiments/TrajCls_VidVRD/NoBgEmb/cfg_.py \
        --output_dir experiments/TrajCls_VidVRD/NoBgEmb \
        --save_tag bs128

NOTE:

Test

refer to the commands in tools/eval_traj_cls_both.py

RelationCls Module (VidVRD dataset)

1) label assignment

refer to the commands in tools/VidVRD_label_assignment.py, e.g.,

python tools/VidVRD_label_assignment.py \
        --traj_len_th 15 \
        --min_region_th 5 \
        --vpoi_th 0.9 \
        --cache_tag PredSplit_v2_FullySupervise \
        --is_save

2) Train

e.g., refer to the commands in tools/train_relation_cls.py for other settings (ablation studies)

### Table-2 (RePro with both base and novel training data) (RePro_both_BaseNovel_training)
    # stage-1  (A-100 24G memory, 50 epochs total 3.5 hour)
    TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=1 python tools/train_relation_cls.py \
        --use_gt_only_data \
        --model_class AlproPromptTrainer_Grouped \
        --train_dataset_class VidVRDGTDatasetForTrain_GIoU \
        --eval_dataset_class VidVRDUnifiedDataset_GIoU \
        --cfg_path  experiments/RelationCls_VidVRD/RePro_both_BaseNovel_training/stage1/cfg_.py \
        --output_dir experiments/RelationCls_VidVRD/RePro_both_BaseNovel_training/stage1/ \
        --eval_split_traj all \
        --eval_split_pred all \
        --save_tag bsz32

    # stage-2  (A-100 15G, (about 14791 M), 50 epochs total 2.5 hour )  
    TOKENIZERS_PARALLELISM=false CUDA_VISIBLE_DEVICES=0 python tools/train_relation_cls.py \
        --model_class OpenVocRelCls_stage2_Grouped \
        --train_dataset_class VidVRDUnifiedDataset_GIoU \
        --eval_dataset_class VidVRDUnifiedDataset_GIoU \
        --cfg_path experiments/RelationCls_VidVRD/RePro_both_BaseNovel_training/stage2/cfg_.py \
        --output_dir experiments/RelationCls_VidVRD/RePro_both_BaseNovel_training/stage2/ \
        --save_tag bsz32

3) Test

refer to tools/eval_relation_cls.py for different test settings