gazebosim / gz-sensors

Provides numerous sensor models designed to generate realistic data from simulation environments.
https://gazebosim.org
Apache License 2.0
52 stars 53 forks source link

Add bounding box sensor #135

Open adlarkin opened 3 years ago

adlarkin commented 3 years ago

Related to #134

Desired behavior

Add a bounding box sensor to Ignition.

Alternatives considered

N/A (this is a new feature)

Implementation suggestion

We will need functionality added to the following repositories:

Additional context

A lot of work for implementing this sensor is already underway. Here are all of the related PRs that have been opened:

chapulina commented 3 years ago

It would be interesting to add some more context about the use case for the sensor and what kind of data it should produce. I think it's especially important to define these clearly because this sensor wouldn't be replicating any real physical sensors. In the real world, this kind of data is obtained by operating on data provided by other sensors, like cameras.

adlarkin commented 3 years ago

It would be interesting to add some more context about the use case for the sensor and what kind of data it should produce.

The original thought was to use this sensor to provide users with datasets that could be used for training machine learning models. Manually generating bounding boxes for images can be difficult and/or take a while, so the hope is that the process of data collection can be made easier by using this sensor with various scenes that are created in simulation.

Regarding what data it should produce: the thought is to provide users with an image and a corresponding bounding box message which provides the bounding box information for all of the requested objects in a scene (a new bounding box message type needs to be created in ign-msgs). The bounding box message can look similar to the bounding box message type used in darknet_ros. So, once a user creates a simulation world and attaches labels to objects that the user wants bounding boxes for (see https://github.com/ignitionrobotics/ign-gazebo/pull/853), an image would be captured of the scene and bounding box data would be specified for the objects in the scene requested by the user (assuming that the objects are visible, of course). To take this a step further, the scene could be altered (changing the camera position, object positions, lighting, etc.) to generate another image with bounding box data. If we add a way to programmatically alter the scene, this would allow users to quickly generate hundreds of various scenes with corresponding bounding box data.

this sensor wouldn't be replicating any real physical sensors. In the real world, this kind of data is obtained by operating on data provided by other sensors, like cameras.

Yep :+1: this sensor is meant to provide a way for automatic bounding box generation so that users don't need to operate on data from things like cameras. As mentioned earlier in my comment, the main use case for this would probably be dataset generation for ML models, but perhaps users will find this useful for other use cases as well.


@chapulina does that help answer your questions and/or clarify the design/motivation behind this sensor? Or is there anything else that needs to be addressed/considered?

chapulina commented 3 years ago

Thank you for all the clarification, this is great context :+1:

The bounding box message can look similar to the bounding box message type used in darknet_ros.

As a reference, I'll also link to vision_msgs, which provides some different bounding box messages.


Just some clarifications about the bounding box data:

adlarkin commented 3 years ago

Will the boxes be always aligned?

Yes, I believe so - are there reasons/use cases that should make us consider using bounding boxes that aren't axis aligned?

Will the boxes be 2D, 3D or both?

Right now, development is being done for 2D bounding boxes only. However, I am sure that 3D bounding boxes would also be useful, so perhaps we can add in 3D functionality once the 2D implementation has been completed.

chapulina commented 3 years ago

are there reasons/use cases that should make us consider using bounding boxes that aren't axis aligned?

I'm not sure, but bounding box messages in ROS are oriented, so I think it's worth documenting the rationale for orienting it or not. As a reference, here's some discussion on the vision message standards for ROS that could be useful. The vision_msgs README also has useful information.

perhaps we can add in 3D functionality once the 2D implementation has been completed

My only concern is not to have to go back and change messages and APIs if we decide to support 3D later. If we want to leave the way open for it, we may want to use 3D types on the APIs for future-proofing.

AmrElsersy commented 3 years ago

The bounding boxes in 3D datasets are always oriented boxes, but in 2D datasets are usually axis aligned. the 3d boxes format in datasets is usually center(x,y,z), width, height, depth, orientation just like the ROS msgs you mentioned Louise

My only concern is not to have to go back and change messages and APIs if we decide to support 3D later. If we want to leave the way open for it, we may want to use 3D types on the APIs for future-proofing.

Ok, I will work on that, on providing 3D boxes in rendering first, then will modify all the PRs of the bounding box stuff with the new format, which is the oriented boxes just like the ROS msg

Is that make sense ?

AmrElsersy commented 3 years ago

Also another thing, should we have a separate msgs for 2D boxes and 3D boxes ?? like the ROS msgs (BoundingBox3D, BoundingBox2D), or we can use the 3D boxes format with the 2D also and say "its ok the 2D will have the same format but with z=0 and zero orientation"

..... Also, about the label ... in the ROS msgs, the bounding boxes msgs just have the dim/rotation information, and they have a separate msg for the detection (combining the label with box) , should we do that ?? I think it is ok to just combine the label with the dim/rotation info directly, so is that ok also ?

chapulina commented 3 years ago

should we have a separate msgs for 2D boxes and 3D boxes

I'm ok with that, come to think of it, keeping them separate may help with setters and conversions.

just combine the label with the dim/rotation info directly

My slight preference is for having a separate message to combine the box with the label, so users can choose to use the one with or without the label. We already have a 3D box message without a label.

AmrElsersy commented 3 years ago

You mean that we need to use the exist AxisAlignedBox msg in the 2D boxes ?? and make a msg that combine that with the label ? .... The problem is that this msg is Axis aligned but in 3D, so I don't think it will be suitable to use that for 2D, I think we should create a msg for 2D boxes with the new format (same for 3D)

chapulina commented 3 years ago

Ah sorry, so I meant something like this:

message AxisAligned2DBox
{
  /// \brief Optional header data
  Header header  = 1;

  /// \brief Minimum corner of the axis aligned bound box in the global frame.
  Vector2d min_corner  = 2;

  /// \brief Maximum corner of the axis aligned bound box in the global frame.
  Vector2d max_corner  = 3;
}
message AnnotatedAxisAligned2DBox
{
  /// \brief Optional header data
  Header header  = 1;

  AxisAligned2DBox box  = 2;

  string label  = 3;
}

But I don't have a strong preference.

adlarkin commented 3 years ago

Ah sorry, so I meant something like this: ... But I don't have a strong preference.

I like what you suggested for 2D axis-aligned bounding boxes, @chapulina (having separate message types for the bounding box and the annotated bounding box). We can take the approach of separate message types for 3D bounding boxes as well, but for 3D bounding boxes, we will probably need to create a new message type instead of using the existing AxisAlignedBox message type, since the 3D boxes are oriented, as @AmrElsersy mentioned.

For the 3D oriented bounding boxes, what if we did something like this?

message Oriented3DBox
{
  /// \brief Optional header data
  Header header  = 1;

  /// \brief Center and orientation of the bounding box
  Pose pose = 2;

  /// \brief The size of the bounding box (width/height/depth)
  Vector3d boxSize = 3;
}
message AnnotatedOriented3DBox
{
  /// \brief Optional header data
  Header header  = 1;

  Oriented3DBox box  = 2;

  string label  = 3;
}

Also, do we want the label in the annotated message types to be a string or something like an unsigned int? It looks like the vision_msgs/ObjectHypothesis message type uses a numerical ID as a label for objects, and makes use of a "metadata database" to perform the conversion between a numerical ID and human-readable label. I like this approach of a numerical ID and then a separate lookup mechanism because using ints over strings is more efficient. I also believe that using int labels is more common than string labels in computer vision applications.

scpeters commented 2 years ago

is there a discussion about merging oriented bounding boxes into a single box?

AmrElsersy commented 2 years ago

is there a discussion about merging oriented bounding boxes into a single box?

We start discussing it in the rendering PR https://github.com/ignitionrobotics/ign-rendering/pull/334#issuecomment-884783383