Rendering Camera representations strategically

In a simulation with multiple cameras, we want the following:

Each camera has a visual representation (i.e., so when a camera looks at another camera, it sees it).
No camera should appear in its own image.
The camera's intrinsics/extrinsics should as reasonably as possible reflect the real world.

There is no mechanism to intuitively satisfy all three goals. Realistic intrinsics/extrinsics would put the camera sensor on the interior of the camera geometry (which is where it is in the real world). As such, the GL sensor will see the camera geometry in its own image.

Possible solutions based on current technology:

Using <drake:accepting_renderer>
- This is best for omitting a geometry from rendering entirely by naming a non-existent renderer as the only accepting renderer.
- To make this work for this domain, we would have to:
  - create a unique render engine per camera with very specific names.
  - For each camera model (e.g., a unique sdf per camera), it would have to enumerate all renderers except its own.
    - Alternatively, after parsing, this property could be configured in code. Which is also tedious, but more adaptive.
Make sure the geometry all renders as one-sided (making sure the GL camera only sees triangle backs and, therefore, renders nothing).
- This may not work if the geometry itself doesn't admit for only forward facing triangles from the sensor's origin.

Possible alternatives:

Change the <drake:accepting_renderer> tag to accept a regular expression instead of literal string matching.
- This would still require one render engine per camera, and the same model couldn't be parsed multiple times; they'd have to have different properties.
Add a tag that would be something more like "camera filter" which takes a regular expression on the name of the camera to determine whether a geometry is visible or not.
- This would require some deep plumbing as camera name (or any other unique identifier associated with the concept of a Drake RenderCamera is not available at the renderer level.
Some other mechanism, yet to be determined, that allows this kind of filtering.
- Perhaps it would be help if we made use of sdformat's <sensor> and <camera> tags?
  1. Handle this at the model directives level where loading the same model multiple times can be differentiated into different sensors.
    - This might have to go a level higher in scene specifications as this is typically where rendering configuration happens.

The goal would be:

Specify a camera model once.
Instantiate it multiple times knowing that its geometry won't appear in its own rendering by default.
Ideally, be able to share a single RenderEngine instance to reduce runtime cost of updating state.

RobotLocomotion / drake

Rendering Camera representations strategically #22133