Closed alex-mitrevski closed 4 years ago
I'm not actually convinced that objects and people can be the same thing, since the tasks involved with both can differ significantly (anything speech related for example, though I realize it's not relevant here in mas_perception_msgs
). An alternative maybe is creating a visual information/categorization message and reusing it in both object and person message.
I'm not actually convinced that objects and people can be the same thing, since the tasks involved with both can differ significantly
In what sense are they not the same? As I mentioned in the PR description, the messages are added specifically for a visual recognition task; they are not meant to be used for other purposes. To put it in other terms, they are data structures that will make it possible to write a generic visual recognition algorithm.
An alternative maybe is creating a visual information/categorization message and reusing it in both object and person message.
I'm not sure what you mean with this message; could you please elaborate?
In what sense are they not the same?
The tasks are different, with possibly more interactions than just picking and placing. The question is whether this will be affected by how the message is defined. Potentially something like age, gender will be recognition for people only. This can maybe dealt with via a list of attributes, but we have to be careful about mixing types.
I'm not sure what you mean with this message; could you please elaborate?
It may be just the naming issue ('subject' may be better than 'object'?). So is this meant to replace the Object.msg
and Person.msg
? What I have in mind is essentially instead of having the ObjectViews.msg
just add the ObjectView.msg
to the Person.msg
and Object.msg
to replace the other cloud/image fields.
The tasks are different, with possibly more interactions than just picking and placing. The question is whether this will be affected by how the message is defined. Potentially something like age, gender will be recognition for people only. This can maybe dealt with via a list of attributes, but we have to be careful about mixing types.
You are thinking in terms of tasks here, but visual recognition doesn't necessarily have anything to do with a task. Nor with attributes (though attributes can be used to update the confidence in the recognition). But in any case, we try to answer questions of the following type: is the face I see Minh, or Alex, or someone else; or is the cup I see Alex's cup, or Minh's cup, or an unknown cup? We try to answer this given a list of "views" of each face, or each cup.
It may be just the naming issue ('subject' may be better than 'object'?). So is this meant to replace the Object.msg and Person.msg?
No, replacement is not what I had in mind. The new messages are supposed to supplement the existing messages (particularly since not all applications need a recognition functionality).
What I have in mind is essentially instead of having the ObjectViews.msg just add the ObjectView.msg to the Person.msg and Object.msg to replace the other cloud/image fields.
This doesn't fully work because of the multiple view assumption (i.e. we may have a collection of images/clouds of the same object/person/face from multiple views). Unless I add the ObjectViews
message to both Person
and Object
; that would work, I suppose.
Unless I add the
ObjectViews
message to bothPerson
andObject
; that would work, I suppose.
I meant adding ObjectView[]
to the fields in Person
and Object
and use name
and category
from thoses messages.
If it's to supplement those messages then I don't really have any issue with the change.
I meant adding ObjectView[] to the fields in Person and Object and use name and category from thoses messages.
OK, that makes sense. I added an ObjectView
array to Person
, Face
, and Object
. I will however change the PR to WIP now since these are breaking changes (requiring changes in at least mas_domestic_robotics
and mas_perception_libs
).
@sthoduka @deebuls Will these changes break anything in the @work code?
I meant adding ObjectView[] to the fields in Person and Object and use name and category from thoses messages.
OK, that makes sense. I added an
ObjectView
array toPerson
,Face
, andObject
. I will however change the PR to WIP now since these are breaking changes (requiring changes in at leastmas_domestic_robotics
andmas_perception_libs
).@sthoduka @deebuls Will these changes break anything in the @work code?
Yes, this will break most of our perception codes. We use Object.msg
in our perception codes.
@mhwasil I was afraid that would be the case. But you will use noetic-devel
from now on, right? If that's the case, I could still merge the changes to kinetic-devel
since I'll continue working with that branch for the time being.
@mhwasil I was afraid that would be the case. But you will use
noetic-devel
from now on, right? If that's the case, I could still merge the changes tokinetic-devel
since I'll continue working with that branch for the time being.
Yes, we will use noetic-devel
. Nevertheless, changing our code is an easy task :-), so we do not have a problem with that.
so regarding the naming of the msg ObjectView.msg
, would it be better to use ObjectRepresentation.msg
instead? or any other name.
ObjectView is somehow confused with an object having different view points.
Yes, we will use noetic-devel. Nevertheless, changing our code is an easy task :-), so we do not have a problem with that.
OK, that's great!
so regarding the naming of the msg ObjectView.msg, would it be better to use ObjectRepresentation.msg instead? or any other name. ObjectView is somehow confused with an object having different view points.
Actually, that's exactly what the list of ObjectView
messages is supposed to represent - the same object as seen from different view points. That's why I used that name for the message.
I thought it was about different representations too, but I didn't read carefully enough. ObjectView sounds fine for different viewpoints
Thanks everyone for your suggestions. I'm going to merge this now so that I can also proceed with https://github.com/b-it-bots/mas_perception_libs/pull/18 and https://github.com/b-it-bots/mas_domestic_robotics/pull/234.
Summary
Related to https://github.com/b-it-bots/mas_domestic_robotics/issues/26 and https://github.com/b-it-bots/mas_domestic_robotics/pull/233
The PR adds messages that are useful for permanently memorising object views. This is useful, for example, for object and face recognition, where we need to store a (small) set of prototype views of objects/faces.
I particularly added three messages to support the representation of views:
ObjectEmbedding
: A (low-dimensional) embedding of an object, for instance found by a Siamese networkObjectView
: An object view represented by (i) an image, (ii) a point cloud, and (iii) an embeddingObjectViews
: A list of individual object views~ Based on the discussion below, I removed this message and instead added anObjectView
array to thePerson
,Face
, andObject
messagesI decided to represent an object view with three different types of information - a cloud, an image, and an embedding - since different modalities may be useful in different contexts: in particular, the cloud could be used for registration, while the image/embedding could be used for image-based recognition.
The image and the embedding are redundant (the message encodes an implicit assumption that the embedding is found from the associated image), but they could serve different purposes: the embedding is useful to have for fast(er) recognition (so that we don't have to recompute it every time we want to recognise an object); the image is primarily useful for transparency (for example, we might want to identify the images that were responsible for recognising an object).
Need for the PR
Currently, the repository contains separate messages for people and objects. The added messages unify objects and people into a single representation that is specifically designed for a visual recognition task. The expected use is that the view messages will be filled from the data in the object/person messages, which are used during online detection.