facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
12.44k stars 1.15k forks source link

GPU Memory Remains Occupied After Processing Videos with SAM2 #258

Open nku-zhichengzhang opened 2 months ago

nku-zhichengzhang commented 2 months ago

Description

When running SAM2 (Segment Anything Model 2) for multiple videos sequentially, I've noticed that the GPU memory remains occupied even after processing a video. This prevents the efficient use of GPU resources for subsequent video processing tasks, despite attempts to clear the memory.

Steps to Reproduce

  1. Set up a loop to process multiple videos sequentially using SAM2.
  2. For each video: a. Build a SAM2 video predictor b. Initialize the predictor state c. Add new points or box d. Propagate through the video e. Delete the predictor and attempt to clear CUDA cache
  3. Observe that GPU memory is not fully released between videos

Code Snippet

for vidname in tqdm(vidnames):
    predictor = build_sam2_video_predictor(model_cfg, sam2_checkpoint, device=device, apply_postprocessing=False)
    inference_state = predictor.init_state(video_path=viddir,
        offload_video_to_cpu=True,
        offload_state_to_cpu=False,
    )
    predictor.reset_state(inference_state)

    _, out_obj_ids, out_mask_logits = predictor.add_new_points_or_box(
                    inference_state=inference_state,
                    frame_idx=ann_frame_idx,
                    obj_id=ann_obj_id,
                    box=ann_obj_box,
                )
    for out_frame_idx, out_obj_ids, out_mask_logits in tqdm(predictor.propagate_in_video(inference_state), desc="propagate in video"):
        video_segments[out_frame_idx] = {
            out_obj_id: (out_mask_logits[i] > 0.0).cpu().numpy()
            for i, out_obj_id in enumerate(out_obj_ids)
        }
    del predictor
    torch.cuda.empty_cache()

Expected Behavior

The GPU memory should be largely released after processing each video, allowing for efficient processing of subsequent videos without memory accumulation.

Actual Behavior

Despite using del predictor and torch.cuda.empty_cache(), the GPU memory is not fully released between video processing tasks. This leads to accumulating memory usage and potential out-of-memory errors for long-running processes or large datasets.

Request for Assistance

I would appreciate any insights into why the GPU memory is not being fully released and how to resolve this issue. Are there any known memory management best practices for SAM2 or similar deep learning models that I should be implementing?

fducau commented 2 months ago

The memory is not released after processing single images either.

yondonfu commented 1 month ago

The inference_state object is used to cache tracking data within add_new_points_or_box() and propagate_in_video(). It looks like the code snippet in the OP deletes the predictor, but does not perform any cleanup for inference_state after it is used.

reset_state(inference_state) should clear tracking data from the object which should release some GPU memory. The object will still contain frame data that was loaded during the call to init_state() though so if you want to release the associated GPU memory for that too you could call del inference_state. And in both cases, the following would release the GPU memory after.

import gc
gc.collect()
torch.cuda.empty_cache()
lili-stajer-technolynx commented 1 month ago

I believe that building the predictor in every loop creates the issue, as in the build_sam2_video_predictor it sends the model to the device, for me, it helped to create a global model instead of doing it in every iteration loop, however using the empty_cache and gc.collect() methods did not