Can T-Rex deal with real-time webcam frames provided by OpenCV and only get the instance mask of the obiect within my rectangle?

IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

https://deepdataspace.com/blog/T-Rex

Other

2.28k stars 147 forks source link

Can T-Rex deal with real-time webcam frames provided by OpenCV and only get the instance mask of the obiect within my rectangle? #15

Closed Hanning-Liu closed 11 months ago

Hanning-Liu commented 11 months ago

Hi, @Mountchicken and @spacewalk01 !

I've tried the demo of T-Rex on this website https://deepdataspace.com/playground/ivp, and the result is awsome!

I have three questions about T-Rex which are also written in the screenshot above:

Is it possible to only get the instance mask of the obiect within my rectangle?
Is it compatible with the real-time webcam frames provided by OpenCV, and can it maintain the same intuitive interaction method when dealing with real-time webcam frames?
Can it keep the impressive detection effect even after the object disappears from the real-time video frames and appears again?

Thank you for your wonderful work and looking forward to your reply!

Mountchicken commented 11 months ago

Hi @Hanning-Liu

"To get the instance mask of the obiect within your rectangle": This is what SAM does and we currently don't have plan to implement it in the demo.
"real-time webcam frame": That's a good idea. However T-Rex doesn't run in a real-time fashion. The inference time can up to 0.2s for each image, even only for detection. We will continually optimize the speed.
"keep the detection effect even after the object disappears from the real-time video frames and appears again": I am not sure what you are asking, but this do sounds like the cross-image detection workflow of T-Rex, where you only need to draw one box on a image and T-Rex will detect the rest of the images.

Hanning-Liu commented 11 months ago

Hi @Mountchicken , Thank you for your detailed reply!

As you currently don't have plans to implement this feature in T-Rex, could you please direct me to a repository where this feature has already been implemented? I think native SAM can’t implement this. I'm new to the field of AI and am looking to integrate a similar feature into my project.
Really eager to see you further progress on real-time image process!
Maybe I will explain more details about this in the next few days!

Thank you!

Mountchicken commented 11 months ago

@Hanning-Liu For Q1. SAM can segment object inside a drawn bounding box, I guess this is the feature you want?

Hanning-Liu commented 11 months ago

Hi @Mountchicken , Yes! I just came up with an idea to interactively cut out a rectangle area using OpenCV and only apply SAM algorithm to that area, finally overide the original pixels with the segmented mask.

Mountchicken commented 11 months ago

Acctually you don't need to crop the image. You can just pass the whole image and the corresponding rectangle coordinates to SAM, and SAM will only segment thing in that rectangle.

Hanning-Liu commented 11 months ago

Thank you! I will give it a try!