UARK-AICV / VLTinT

[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
https://uark-aicv.github.io/VLTinT/
65 stars 6 forks source link

Extract bbox bug for multi gpu processing #6

Closed fake-warrior8 closed 1 year ago

fake-warrior8 commented 1 year ago

Hi, I found it too slow to extract the agent features using

cd SlowFast
python tools/run_net.py --cfg configs/Kinetics/SLOWONLY_8x8_R50.yaml --feature_extraction --num_features 100 --video_dir path/to/dir/rescaled --feat_dir path/to/data/[anet/yc2]/c3d_agent MODEL.NUM_CLASSES 200 TEST.CHECKPOINT_TYPE caffe2 TEST.CHECKPOINT_FILE_PATH models/SLOWONLY_8x8_R50.pkl NUM_GPUS 1 TEST.BATCH_SIZE 1 DATA.PATH_TO_BBOX_DIR path/to/dir/bbox DETECTION.ENABLE True DETECTION.SPATIAL_SCALE_FACTOR 32 DATA.SAMPLING_RATE 1 DATA.NUM_FRAMES 16 RESNET.SPATIAL_STRIDES [[1],[2],[2],[1]] RESNET.SPATIAL_DILATIONS [[1],[1],[1],[2]] DATA.PATH_TO_TMP_DIR /tmp/agent_0/

which uses only single GPUs (it will cost 20+ days ). However, when I set the num_GPUS=8 TEST.BATCH_SIZE=8, it raises an error

"SlowFast/tools/feature_extraction.py", line 74, in perform_bbox_feature_extract
    'features': features.cpu().tolist(),
AttributeError: 'list' object has no attribute 'cpu'

Could you give a multi GPU version of this code?

Kashu7100 commented 1 year ago

Thank you for reaching out. Unfortunately, we don't plan to implement a multi-GPU version of preprocessing code. As I remember, we processed by chunk and ran several threads to accelerate the processing.

To save you time, I made the pre-extracted feature available here, so feel free to download! I will update the README accordingly.