Face sentiments - Githubissues

behzadshomali / Image-Describe-Pipe

This app outputs the name, coordinations, sentiment of each extracted face, and besides a brief description of the scene's context for each input image.

2 stars 0 forks source link

Face sentiments #21

Closed behroozomidvar closed 3 years ago

behroozomidvar commented 3 years ago

The descriptions generated by ImageDescribe should not be limited to face names and objects, and it should also report sentiments in the faces.

For instance, in this famous photo of Elen Degeneres, ImageDesribe should be able to describe the following elements:

Names of people in the photo, including Bradley Cooper, Angelina Jolie, Brad Pitt, and Jennifer Lawrence.
The scene, which can be described for instance with words such as "ceremony" or "soirée".
The emotions and sentiments in the faces, e.g., "Bradley Cooper is smiling", "Jared Leto is surprised", and "Julia Roberts is laughing".

behzadshomali commented 3 years ago

@behroozomidvar to be able to describe the scene I just found a GitHub repository that does the same using Pytorch. It has three versions which can be used. Here is the output of each version for the provided image by you:

v3: A group of people standing next to each other holding a cell phone
v1, v2: A group of people standing next to each other

behzadshomali commented 3 years ago

@yasminesmati as my hardware setup is not really promising, could you please clone this repository and follow the README file. Then consider several images (say 5) and try to output the result 3 or 4 times for each image per each version. Then please record the following result:

the average time it took to output the result
a comparison with different 3 versions

behzadshomali commented 3 years ago

@behroozomidvar among the 3 tasks you've already mentioned, at the moment we can perform the first task without any problem. About the 3rd task, it's just an extra feature of DeepFace that we are already using in the project; so it wouldn't be a pain in the neck to fulfill that. And in the last for the 2nd one, we will start working on it as soon as possible.

behroozomidvar commented 3 years ago

Sound great.

yasaminesmati commented 3 years ago

@behzadshomali About the difference between three versions ,I guess there is no obvious difference between them. I checked the output for several images and they weren't very different. Here is an example of outputs of one of the images: "a man riding on the back of a brown horse" "a man riding a horse on a lush green hillside"

and about the average time here is the result: v1:11s v2:14s v3:13s

In conclusion, from my point of view although there is no particular difference between the outputs, they are different when showing the output and according to the time average we can simply use the fastest one.

Please let me know if you have any comments.

behzadshomali commented 3 years ago

@yasminesmati I totally agree with you. As the results are slightly the same as each other, to be able to choose among these three versions we should consider their corresponding time average. By the way, the time averages really seem promising; they are much faster than I expected (in particular V1)! In conclusion, based on your records it would be reasonable to choose version 1 (V1) to proceed with the project.

behzadshomali commented 3 years ago

As @behroozomidvar said we have the following tasks to be considered:

[x] Names of people in the photo
[x] The scene
[x] The emotions and sentiments in the faces

I will work on the third one (in a separated branch) and once it is finished I will merge it to master. Meantime, @yasminesmati please make a new branch and start developing the code for the second task. And once we all are finished, it's time to merge our activities with the work done so far.