Object view message definitions

alex-mitrevski commented 4 years ago

Summary

The PR adds messages that are useful for permanently memorising object views. This is useful, for example, for object and face recognition, where we need to store a (small) set of prototype views of objects/faces.

I particularly added three messages to support the representation of views:

ObjectEmbedding: A (low-dimensional) embedding of an object, for instance found by a Siamese network
ObjectView: An object view represented by (i) an image, (ii) a point cloud, and (iii) an embedding
~ObjectViews: A list of individual object views~ Based on the discussion below, I removed this message and instead added an ObjectView array to the Person, Face, and Object messages

I decided to represent an object view with three different types of information - a cloud, an image, and an embedding - since different modalities may be useful in different contexts: in particular, the cloud could be used for registration, while the image/embedding could be used for image-based recognition.

The image and the embedding are redundant (the message encodes an implicit assumption that the embedding is found from the associated image), but they could serve different purposes: the embedding is useful to have for fast(er) recognition (so that we don't have to recompute it every time we want to recognise an object); the image is primarily useful for transparency (for example, we might want to identify the images that were responsible for recognising an object).

Need for the PR

Currently, the repository contains separate messages for people and objects. The added messages unify objects and people into a single representation that is specifically designed for a visual recognition task. The expected use is that the view messages will be filled from the data in the object/person messages, which are used during online detection.

minhnh commented 4 years ago

I'm not actually convinced that objects and people can be the same thing, since the tasks involved with both can differ significantly (anything speech related for example, though I realize it's not relevant here in mas_perception_msgs). An alternative maybe is creating a visual information/categorization message and reusing it in both object and person message.

alex-mitrevski commented 4 years ago

I'm not actually convinced that objects and people can be the same thing, since the tasks involved with both can differ significantly

In what sense are they not the same? As I mentioned in the PR description, the messages are added specifically for a visual recognition task; they are not meant to be used for other purposes. To put it in other terms, they are data structures that will make it possible to write a generic visual recognition algorithm.

An alternative maybe is creating a visual information/categorization message and reusing it in both object and person message.

I'm not sure what you mean with this message; could you please elaborate?

minhnh commented 4 years ago

In what sense are they not the same?

The tasks are different, with possibly more interactions than just picking and placing. The question is whether this will be affected by how the message is defined. Potentially something like age, gender will be recognition for people only. This can maybe dealt with via a list of attributes, but we have to be careful about mixing types.

I'm not sure what you mean with this message; could you please elaborate?

It may be just the naming issue ('subject' may be better than 'object'?). So is this meant to replace the Object.msg and Person.msg? What I have in mind is essentially instead of having the ObjectViews.msg just add the ObjectView.msg to the Person.msg and Object.msg to replace the other cloud/image fields.

alex-mitrevski commented 4 years ago

The tasks are different, with possibly more interactions than just picking and placing. The question is whether this will be affected by how the message is defined. Potentially something like age, gender will be recognition for people only. This can maybe dealt with via a list of attributes, but we have to be careful about mixing types.

You are thinking in terms of tasks here, but visual recognition doesn't necessarily have anything to do with a task. Nor with attributes (though attributes can be used to update the confidence in the recognition). But in any case, we try to answer questions of the following type: is the face I see Minh, or Alex, or someone else; or is the cup I see Alex's cup, or Minh's cup, or an unknown cup? We try to answer this given a list of "views" of each face, or each cup.

It may be just the naming issue ('subject' may be better than 'object'?). So is this meant to replace the Object.msg and Person.msg?

No, replacement is not what I had in mind. The new messages are supposed to supplement the existing messages (particularly since not all applications need a recognition functionality).

What I have in mind is essentially instead of having the ObjectViews.msg just add the ObjectView.msg to the Person.msg and Object.msg to replace the other cloud/image fields.

This doesn't fully work because of the multiple view assumption (i.e. we may have a collection of images/clouds of the same object/person/face from multiple views). Unless I add the ObjectViews message to both Person and Object; that would work, I suppose.

minhnh commented 4 years ago

Unless I add the ObjectViews message to both Person and Object; that would work, I suppose.

I meant adding ObjectView[] to the fields in Person and Object and use name and category from thoses messages.

If it's to supplement those messages then I don't really have any issue with the change.

alex-mitrevski commented 4 years ago

I meant adding ObjectView[] to the fields in Person and Object and use name and category from thoses messages.

OK, that makes sense. I added an ObjectView array to Person, Face, and Object. I will however change the PR to WIP now since these are breaking changes (requiring changes in at least mas_domestic_robotics and mas_perception_libs).

@sthoduka @deebuls Will these changes break anything in the @work code?

mhwasil commented 4 years ago

I meant adding ObjectView[] to the fields in Person and Object and use name and category from thoses messages.

OK, that makes sense. I added an ObjectView array to Person, Face, and Object. I will however change the PR to WIP now since these are breaking changes (requiring changes in at least mas_domestic_robotics and mas_perception_libs).

@sthoduka @deebuls Will these changes break anything in the @work code?

Yes, this will break most of our perception codes. We use Object.msg in our perception codes.

alex-mitrevski commented 4 years ago

@mhwasil I was afraid that would be the case. But you will use noetic-devel from now on, right? If that's the case, I could still merge the changes to kinetic-devel since I'll continue working with that branch for the time being.

mhwasil commented 4 years ago

@mhwasil I was afraid that would be the case. But you will use noetic-devel from now on, right? If that's the case, I could still merge the changes to kinetic-devel since I'll continue working with that branch for the time being.

Yes, we will use noetic-devel. Nevertheless, changing our code is an easy task :-), so we do not have a problem with that. so regarding the naming of the msg ObjectView.msg, would it be better to use ObjectRepresentation.msg instead? or any other name. ObjectView is somehow confused with an object having different view points.

alex-mitrevski commented 4 years ago

Yes, we will use noetic-devel. Nevertheless, changing our code is an easy task :-), so we do not have a problem with that.

OK, that's great!

so regarding the naming of the msg ObjectView.msg, would it be better to use ObjectRepresentation.msg instead? or any other name. ObjectView is somehow confused with an object having different view points.

Actually, that's exactly what the list of ObjectView messages is supposed to represent - the same object as seen from different view points. That's why I used that name for the message.

sthoduka commented 4 years ago

I thought it was about different representations too, but I didn't read carefully enough. ObjectView sounds fine for different viewpoints

alex-mitrevski commented 4 years ago

Thanks everyone for your suggestions. I'm going to merge this now so that I can also proceed with https://github.com/b-it-bots/mas_perception_libs/pull/18 and https://github.com/b-it-bots/mas_domestic_robotics/pull/234.

b-it-bots / mas_perception_msgs

Object view message definitions #11

Summary

Need for the PR