brjathu / LART

Code repository for the paper "On the Benefits of 3D Pose and Tracking for Human Action Recognition", (CVPR 2023)
https://github.com/brjathu/LART
249 stars 32 forks source link

Cuda Out of Memory Error when Running the Demo Script #8

Closed zhangy76 closed 9 months ago

zhangy76 commented 1 year ago

Hi,

Thanks for releasing this great work! When runing the demo script on two 4090 GPU, I obtain the following error

File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 112, in cal_rel_pos_spatialattn[:, :, sp_idx:, sp_idx:].view(B, -1, q_t, q_h, q_w, k_t, k_h, k_w) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.25 GiB (GPU 0; 23.65 GiB total capacity; 17.74 GiB already allocated; 3.99 GiB free; 19.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I wonder if you have any suggestion on the possible reasons leading to the error?

Yufei

brjathu commented 1 year ago

Thanks for your interest, in that case, could you please try the instructions in the collab demo? Since collab only supports 16Gb memory we added support for half precision.

For that, simply switch to the dev branch, and then run python scripts/demo.py video.source="assets/jump.mp4" +half=True.

Please let me know if you face any other issues.

zhangy76 commented 1 year ago

Thanks for the reply. I encounter the same error when running python scripts/demo.py video.source="assets/jump.mp4" +half=True. The full error is

 [2023-07-17 21:34:05,743][slowfast.visualization.predictor][INFO] - Start loading model weights.
[2023-07-17 21:34:05,743][slowfast.utils.checkpoint][INFO] - Loading network weights from /home/zhangy76/.cache/phalp/ava/mvit.pyth.
missing keys: []
unexpected keys: []
[2023-07-17 21:34:06,231][slowfast.visualization.predictor][INFO] - Finish loading model weights
Error executing job with overrides: ['video.source=assets/jump.mp4', '+half=True']
Traceback (most recent call last):
  File "/home/zhangy76/LART/scripts/demo.py", line 103, in main
    lart_model.postprocessor.run_lart(pkl_path)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/phalp/visualize/postprocessor.py", line 102, in run_lart
    final_visuals_dic  = self.post_process(final_visuals_dic, save_fast_tracks=self.cfg.post_process.save_fast_tracks, video_pkl_name=video_pkl_name)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/phalp/visualize/postprocessor.py", line 36, in post_process
    smoothed_fast_track_ = self.phalp_tracker.pose_predictor.smooth_tracks(fast_track_, moving_window=True, step=32, window=32)
  File "/home/zhangy76/LART/lart/utils/wrapper_phalp.py", line 226, in smooth_tracks
    fast_track = self.add_slowfast_features(fast_track)
  File "/home/zhangy76/LART/lart/utils/wrapper_phalp.py", line 203, in add_slowfast_features
    task_      = SlowFastWrapper(t_, cfg, list_of_all_frames, mid_bbox_, video_model, center_crop=center_crop)
  File "/home/zhangy76/LART/lart/utils/wrapper_pyslowfast.py", line 61, in SlowFastWrapper
    task = video_model(task)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/visualization/predictor.py", line 110, in __call__
    preds, feats  = self.model(inputs, bboxes)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/video_model_builder.py", line 1239, in forward
    x, thw = blk(x, thw)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/fairscale/nn/checkpoint/checkpoint_activations.py", line 171, in _checkpointed_forward
    return original_forward(module, *args, **kwargs)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 547, in forward
    x_block, thw_shape_new = self.attn(x_norm, thw_shape)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 407, in forward
    attn = cal_rel_pos_spatial(
  File "/home/zhangy76/anaconda3/envs/lart/lib/python3.10/site-packages/slowfast/models/attention.py", line 112, in cal_rel_pos_spatial
    attn[:, :, sp_idx:, sp_idx:].view(B, -1, q_t, q_h, q_w, k_t, k_h, k_w)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.25 GiB (GPU 0; 23.65 GiB total capacity; 17.74 GiB already allocated; 3.99 GiB free; 19.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

The collab demo can be run successfully.

Thanks, Yufei

zhangy76 commented 1 year ago

Hi,

May I ask what GPU you are using? The problem can occure even on a video with only 15 frames.

Yufei

brjathu commented 1 year ago

Apologies for the late reply, I have tested this on V100 (32Gb) and A100(40Gb). But the half-precision demo should work with 16Gb memory. Did you switch to dev branch when you are running it on you local machine?

brjathu commented 1 year ago

I tested it now, since the main branch does not support +half, it takes about 26Gb. Could you please switch to dev branch and try?

zhangy76 commented 1 year ago

Thanks, that may be the reason. May I ask how to switch to dev branch?

Yufei

ql390962 commented 1 year ago

Apologies for the late reply, I have tested this on V100 (32Gb) and A100(40Gb). But the half-precision demo should work with 16Gb memory. Did you switch to dev branch when you are running it on you local machine? I have also encountered this error. Does it support multiple gpus inference? if support, where modify.

brjathu commented 9 months ago

closing due to inactivity, please reopen if you have any questions.