IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
https://arxiv.org/abs/2401.14159
Apache License 2.0
791 stars 61 forks source link

The GPU memory is insufficient? #51

Closed Li-jinnan closed 3 days ago

Li-jinnan commented 1 week ago

The computer I am using has a 4060 graphics card and its got 8G of RAM, I ran ‘python grounded_sam2_tracking_demo_with_continuous_id_gd1.5.py’ and the number of images processed is 192 and I found that that it takes a lot of memory to run and is beyond my computer's capabilities. However, I have found that I don't have this problem when I run SAM2's video tracking feature alone. As I understand it, Grounded-SAM-2 runs Grounding DINO and then SAM2 in iterative order when doing video tracking, so this problem is confusing me. This phenomenon seems to indicate that as the number of images processed at once increases, the amount of video memory required to run it gradually increases, and then when the number of images is very large, the amount of video memory required will be very, very large as well.

rentainhe commented 5 days ago

Hello @Li-jinnan , in this demo, we need to continuely call Grounding DINO to detect the object among fixed steps of frames. I think the best way to process this with low memory cost is that:

Or you can try to use a smaller SAM 2 model.

Li-jinnan commented 4 days ago

Ok, I've solved the problem, thanks for the answer!

rentainhe commented 3 days ago

Ok, I've solved the problem, thanks for the answer!

You're welcome