b-it-bots / mas_domestic_robotics

Robot-independent ROS packages for domestic applications
GNU General Public License v3.0
28 stars 45 forks source link

Person recognition #26

Closed argenos closed 4 years ago

argenos commented 6 years ago

A good place to start is to take a look at the detect_person and gender_recognition actions, the task would be to create an action just like these.

We don't necessarily need to use this in the end, but opencv obviously has things for this. Any other interesting approach is also welcome.

gecortesh commented 6 years ago

This is for recognizing i.e. members of the team and say their names or follow?

argenos commented 6 years ago

Yeah, the action after recognizing the person will vary from task to task, but that is indeed the first part of the problem. The second thing to consider is that we might want to recognize people without associating them with a name, i.e. tracking persons inside the appartment and being able to distinguish between them

alex-mitrevski commented 6 years ago

If someone wants to try out CNNs for this, a Siamese network might be useful. This is a paper in which the model is described; here's a useful blog post as well.

gecortesh commented 6 years ago

I was thinking of bag of features may be simpler

alex-mitrevski commented 6 years ago

Yeah, it's definitely simpler; I don't know how the accuracies compare though (as the set of people grows).

argenos commented 6 years ago

I'd like to add a few comments here that we might need to consider: It's important to be able to recognize the role as well, e.g. post-man, delivery man, police, etc. so we might want to do something that not only relies on the face of the person but also on their clothing.

minhnh commented 6 years ago

How's the progress on this? With Single Shot Multibox, detecting people seems to be quite reliable. As for classifying roles, genders, and emotions, I think we can use the same ImageRecognition state currently used in perceive_plane_action. The image recognition service that the action interacts with takes a model name in its request. We can specify which model to use here.

alex-mitrevski commented 6 years ago

Detection is not the only thing here though; it's important to recognise the actual person as well (so learning from one/only a few examples is needed).

argenos commented 6 years ago

I think it was probably abandoned. Feel free to open a pull request if you’re interested!

minhnh commented 6 years ago

@alex-mitrevski can't we do a binary classifier trained on a single pic of one person? @argenos yeah I'll start putting something together for this.

alex-mitrevski commented 6 years ago

I suppose we could use something like the examplar SVMs described here (in this case, they're used for detection, but we could use the same idea for recognition) - or the Siamese networks I suggested above.

alex-mitrevski commented 5 years ago

The model described here seems to be very appropriate; trying that one out is the most reasonable thing to do (there is an existing implementation here).

oarriaga commented 5 years ago

I believe there was a person at HBRS computer science working on this subject. As far as I can remember he simply used a pre-train network (e.g. VGG16) took the output of a resized image of the face and compared it against each sample of a set of saved face features using cosine similarity. You can extend this by using data augmentation of the sampled image to obtain a more robust description.

minhnh commented 5 years ago

I recently noticed in the PR list of @oarriaga 's face_classification repo a recommendation to use dlib instead of OpenCV for face detection which sounds interesting: oarriaga/face_classification#25.

Additionally, I believe the classification models trained by Octavio can be loaded into ImageRecognitionServer like with object recognition. Then the will be no extra code needed for gender and emotion classification. All you'll need in the repo are the model files. The action then can simply request the service with the appropriate model name, and a classification result will be returned.

argenos commented 5 years ago

Should this be closed in favour of #152?

minhnh commented 5 years ago

A unique discussion here is the mention of Siamese networks for learning new faces, which is a feature for the person identity model. The other issue deals with refactoring and architecture.

alex-mitrevski commented 4 years ago

A Siamese network implementation was added with https://github.com/b-it-bots/dataset_interface/pull/11

Recognition based on extracted face embeddings was added with https://github.com/b-it-bots/mas_knowledge_base

I'll close the issue since these two PRs basically resolve it.