facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Apache License 2.0
6.61k stars 1.21k forks source link

Questions about gpu memory of slowfast network and I3D. #78

Open guancheng817 opened 4 years ago

guancheng817 commented 4 years ago

Hi, thanks for your SlowFast codebase. I use the slowfast network and I3D network for ava pipeline, but I found that the I3D network with less parmeters need more gpu memory than slowfast network and other config are the same.Here are my SLOWFAST_32x2_R50.yaml and I3D_32x2_R50.yaml

SLOWFAST_32x2_R50.yaml
TRAIN:
  ENABLE: True
  DATASET: ava
  BATCH_SIZE: 4
  EVAL_PERIOD: 1
  CHECKPOINT_PERIOD: 1
  AUTO_RESUME: True
  # CHECKPOINT_FILE_PATH: path to pretrain model
  CHECKPOINT_TYPE: caffe2
DATA:
  NUM_FRAMES: 32
  SAMPLING_RATE: 2
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 256
  INPUT_CHANNEL_NUM: [3, 3]
  #INPUT_CHANNEL_NUM: [3]
DETECTION:
  ENABLE: True
  ALIGNED: False
AVA:
  BGR: False
  DETECTION_SCORE_THRESH: 0.8
  TEST_PREDICT_BOX_LISTS: ["ava_val_predicted_boxes.csv"]
SLOWFAST:
  ALPHA: 4
  BETA_INV: 8
  FUSION_CONV_CHANNEL_RATIO: 2
  FUSION_KERNEL_SZ: 7
RESNET:
  ZERO_INIT_FINAL_BN: True
  WIDTH_PER_GROUP: 64
  NUM_GROUPS: 1
  DEPTH: 50
  TRANS_FUNC: bottleneck_transform
  STRIDE_1X1: False
  NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
  #NUM_BLOCK_TEMP_KERNEL: [[3], [4], [6], [3]]
  SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [2, 2]]
  SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [1, 1]]
NONLOCAL:
  LOCATION: [[[], []], [[], []], [[], []], [[], []]]
  #LOCATION: [[[]], [[]], [[]], [[]]]
  GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
  #GROUP: [[1], [1], [1], [1]]
  INSTANTIATION: dot_product
  POOL: [[[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]]]
BN:
  USE_PRECISE_STATS: False
  NUM_BATCHES_PRECISE: 200
  MOMENTUM: 0.1
  WEIGHT_DECAY: 0.0
SOLVER:
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-7
  OPTIMIZING_METHOD: sgd
MODEL:
  NUM_CLASSES: 80
  ARCH: slowfast
  LOSS_FUNC: bce
  DROPOUT_RATE: 0.5
TEST:
  ENABLE: False
  DATASET: ava
  BATCH_SIZE: 8
DATA_LOADER:
  NUM_WORKERS: 2
  PIN_MEMORY: True
NUM_GPUS: 8
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: ./work_dir/SLOWFAST/

I3D_32x2_R50.yaml
TRAIN:
  ENABLE: True
  DATASET: ava
  BATCH_SIZE: 4
  EVAL_PERIOD: 1
  CHECKPOINT_PERIOD: 1
  AUTO_RESUME: True
  # CHECKPOINT_FILE_PATH: path to pretrain model
  CHECKPOINT_TYPE: caffe2
DATA:
  NUM_FRAMES: 32
  SAMPLING_RATE: 2
  TRAIN_JITTER_SCALES: [256, 320]
  TRAIN_CROP_SIZE: 224
  TEST_CROP_SIZE: 256
  #INPUT_CHANNEL_NUM: [3, 3]
  INPUT_CHANNEL_NUM: [3]
DETECTION:
  ENABLE: True
  ALIGNED: False
AVA:
  BGR: False
  DETECTION_SCORE_THRESH: 0.8
  TEST_PREDICT_BOX_LISTS: ["ava_val_predicted_boxes.csv"]
SLOWFAST:
  ALPHA: 4
  BETA_INV: 8
  FUSION_CONV_CHANNEL_RATIO: 2
  FUSION_KERNEL_SZ: 7
RESNET:
  ZERO_INIT_FINAL_BN: True
  WIDTH_PER_GROUP: 64
  NUM_GROUPS: 1
  DEPTH: 50
  TRANS_FUNC: bottleneck_transform
  STRIDE_1X1: False
  #NUM_BLOCK_TEMP_KERNEL: [[3, 3], [4, 4], [6, 6], [3, 3]]
  NUM_BLOCK_TEMP_KERNEL: [[3], [4], [6], [3]]
  #SPATIAL_DILATIONS: [[1, 1], [1, 1], [1, 1], [2, 2]]
  #SPATIAL_STRIDES: [[1, 1], [2, 2], [2, 2], [1, 1]]
NONLOCAL:
  #LOCATION: [[[], []], [[], []], [[], []], [[], []]]
  LOCATION: [[[]], [[]], [[]], [[]]]
  #GROUP: [[1, 1], [1, 1], [1, 1], [1, 1]]
  GROUP: [[1], [1], [1], [1]]
  INSTANTIATION: dot_product
  #POOL: [[[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]], [[1, 2, 2], [1, 2, 2]]]
BN:
  USE_PRECISE_STATS: False
  NUM_BATCHES_PRECISE: 200
  MOMENTUM: 0.1
  WEIGHT_DECAY: 0.0
SOLVER:
  MOMENTUM: 0.9
  WEIGHT_DECAY: 1e-7
  OPTIMIZING_METHOD: sgd
MODEL:
  NUM_CLASSES: 80
  ARCH: i3d
  LOSS_FUNC: bce
  DROPOUT_RATE: 0.5
TEST:
  ENABLE: False
  DATASET: ava
  BATCH_SIZE: 8
DATA_LOADER:
  NUM_WORKERS: 2
  PIN_MEMORY: True
NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: ./work_dir/I3D/
guancheng817 commented 4 years ago

When I use SLOWFAST_32x2_R50.yaml, it needs about 6700MB gpu memory while I3D_32x2_R50.yaml could be out of memory.

yangsusanyang commented 4 years ago

I wonder if you have solved the gpu issue and able to run SLOWFAST. I encountered the following runtime error when I tested SLOWFAST on ava data set. "RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 7.76 GiB total capacity; 6.90 GiB already allocated; 26.12 MiB free; 5.63 Mi B cached)"

If you have solved the issue, maybe I can get your help. Thanks!

kaixinguor commented 4 years ago

have you tried this before model.eval()? may reduce memory cost for test. for p in model.parameters(): p.requires_grad = False

Uniquene commented 4 years ago

I wonder if you have solved the gpu issue and able to run SLOWFAST. I encountered the following runtime error when I tested SLOWFAST on ava data set. "RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 7.76 GiB total capacity; 6.90 GiB already allocated; 26.12 MiB free; 5.63 Mi B cached)"

If you have solved the issue, maybe I can get your help. Thanks!

maybe you could set a smaller train.bath_size