IDEA-Research / Grounded-SAM-2

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
https://arxiv.org/abs/2401.14159
Apache License 2.0
682 stars 48 forks source link

How about 3D version? #20

Open StarsTesla opened 3 weeks ago

StarsTesla commented 3 weeks ago

I wonder can we tracking object by Grounded Sam2 in 3D?

rentainhe commented 3 weeks ago

Grounded SAM 2 can currently only handle 2D images. If it needs to be applied in a 3D scene, I think you first need to project a certain view onto a 2D plane and then apply Grounded SAM 2.

BTW, I was wondering what's the meaning of tracking objects in 3D scenarios, could you explain the scenario you need to use more clearly? This will help us better assist you in brainstorming solutions.

rentainhe commented 3 weeks ago

I have seen a similar example in SEEM: https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once?tab=readme-ov-file#tulip-nerf-examples, is this the scenarios you need or not

StarsTesla commented 3 weeks ago

Grounded SAM 2 can currently only handle 2D images. If it needs to be applied in a 3D scene, I think you first need to project a certain view onto a 2D plane and then apply Grounded SAM 2.

BTW, I was wondering what's the meaning of tracking objects in 3D scenarios, could you explain the scenario you need to use more clearly? This will help us better assist you in brainstorming solutions.

For example, tracking 3D object could be more precise compare to 2D, and tracking multi object is very common in Street scenarios(self-driving) or in some dynamic slam/indoor robot field.