About calculation details of the three proposed new metrics

JunyaoHu commented 7 months ago

First of all, thank you for your work, which has made an important contribution to the combination of image generation and emotion calculation, which is very rare and commendable.

Could you please provide supplementary materials? I would like to know more details about the quality of emotion generation metrics. I believe these questions should have been explained in detail in the supplementary materials.

I have read the relevant codes for three metrics.

For Emo-A, you calculated the accuracy and accuracy scores for the emotion categories. Did you average them for your results, or did you choose one of them? Did you use eight-category results or two-category results, or both?

https://github.com/JingyuanYY/EmoGen/blob/f560012bf56ff68f5c6edc3dfb9728e9c856ad91/training/inference.py#L285-L292

For Sim-C, you calculated the maximum probability between scene classifier and object classifier for each picture, right? Did you consult other papers and adopt this approach? That's an interesting idea.

https://github.com/JingyuanYY/EmoGen/blob/f560012bf56ff68f5c6edc3dfb9728e9c856ad91/metrics/other_metrics.py#L151-L153

For Sim-D, you calculated the cosine similarity of CLIP semantic features and the pixel-level MSE of the image pairs. Do you use one of these metrics, average it, or combine them? And I noticed that this image pair needs to be sampled, and your code says sample the image pair 10 times, how many times should it actually be sampled? What was the consideration for selecting 10 samples?

https://github.com/JingyuanYY/EmoGen/blob/f560012bf56ff68f5c6edc3dfb9728e9c856ad91/metrics/other_metrics.py#L107-L114

https://github.com/JingyuanYY/EmoGen/blob/f560012bf56ff68f5c6edc3dfb9728e9c856ad91/metrics/other_metrics.py#L160-L164

Thank you!

fengjw0909 commented 7 months ago

Thank you for your question. We will upload our supplementary materials in the next few days.

For Emo-A, we used the average of eight emotion accuracy. As for other outputs, we wanted to observe the performance differences in generating various emotional images.
For Sem-C, we did take the maximum probability between the scene classifier and object classifier for each picture. This is a new evaluation metric we propose for the task, without reference to other papers.
For Sem-D, we used pixel-level MSE as the evaluation metric. Since the cosine distance and Euclidean distance of normalized vectors are equivalent and the difference between them is not significant in practical calculations, we chose the latter. The 10 image pairs are implemented from the supplementary material of one of the papers we follow. You can see the paper we follow in the LPIPS section of our main text.

JunyaoHu commented 7 months ago

Thank you very much for your patience!

JingyuanYY / EmoGen

About calculation details of the three proposed new metrics #2