facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.63k stars 421 forks source link

Determine the visual overlap between two cameras? #2258

Open janblumenkamp opened 1 year ago

janblumenkamp commented 1 year ago

I would like to determine the visual overlap between two cameras on two independent agents. Is there any way to do this from within habitat? Thanks!

0mdc commented 1 year ago

Hello @janblumenkamp,

What exactly do you mean by visual overlap? Which type of sensor are you working with? What kind of output are you looking for?

janblumenkamp commented 12 months ago

I am thinking of any kind of camera, e.g. RGB, depth or semantic. Specifically, I would like to know how much of a scene observed through one camera is also observed through a second camera.

In the simplest case, I would like to know what percentage of two viewing frustum overlaps. This is a trivial calculation given the camera extrinsic and intrinsic, assuming a plane. Realistically though, there are occlusions, and the same scene viewed by two different cameras might have a smaller visual overlap.

Essentially, it would probably require casting a ray through each pixel of camera A and checking for each pixel whether a ray can also be cast by camera B without hitting any object in the scene. Does that make sense?

0mdc commented 12 months ago

That makes sense.

I don't think that there's a way to do it using the GPU right now. However, you probably can do this using physics and the Python API.

To get a ray from a specific viewport point, you can use RenderCamera.unproject() (ref). You can then use physics raycasting (Simulator.cast_ray()). Note that this requires the Bullet build.

In this example viewer, the handling of mouse clicks is done using this method. In your case, you would sample the sensor viewport iteratively instead of using the mouse position.

After that, I suppose that the hit positions of the compared sensors would need to be stored in some structure so that the visual overlap can be processed.

@aclegg3 Thoughts?