Closed behroozomidvar closed 3 years ago
@behroozomidvar to be able to describe the scene I just found a GitHub repository that does the same using Pytorch. It has three versions which can be used. Here is the output of each version for the provided image by you:
@yasminesmati as my hardware setup is not really promising, could you please clone this repository and follow the README file. Then consider several images (say 5) and try to output the result 3 or 4 times for each image per each version. Then please record the following result:
@behroozomidvar among the 3 tasks you've already mentioned, at the moment we can perform the first task without any problem. About the 3rd task, it's just an extra feature of DeepFace that we are already using in the project; so it wouldn't be a pain in the neck to fulfill that. And in the last for the 2nd one, we will start working on it as soon as possible.
Sound great.
@behzadshomali About the difference between three versions ,I guess there is no obvious difference between them. I checked the output for several images and they weren't very different. Here is an example of outputs of one of the images: "a man riding on the back of a brown horse" "a man riding a horse on a lush green hillside"
and about the average time here is the result: v1:11s v2:14s v3:13s
In conclusion, from my point of view although there is no particular difference between the outputs, they are different when showing the output and according to the time average we can simply use the fastest one.
Please let me know if you have any comments.
@yasminesmati I totally agree with you. As the results are slightly the same as each other, to be able to choose among these three versions we should consider their corresponding time average. By the way, the time averages really seem promising; they are much faster than I expected (in particular V1)! In conclusion, based on your records it would be reasonable to choose version 1 (V1) to proceed with the project.
As @behroozomidvar said we have the following tasks to be considered:
I will work on the third one (in a separated branch) and once it is finished I will merge it to master. Meantime, @yasminesmati please make a new branch and start developing the code for the second task. And once we all are finished, it's time to merge our activities with the work done so far.
The descriptions generated by ImageDescribe should not be limited to face names and objects, and it should also report sentiments in the faces.
For instance, in this famous photo of Elen Degeneres, ImageDesribe should be able to describe the following elements: