adymaharana / StoryViz

MIT License
31 stars 2 forks source link

Evaluation metrics #4

Open KyonP opened 2 years ago

KyonP commented 2 years ago

Hello, I hope your research goes well. 😀

I am trying to evaluate the metrics that you proposed for my model.

I have read your paper. However, I am asking you to double-check. (my results seem a bit odd and off the scale, that's why 😢)

  1. I presume that the "character F1" score represents the "micro avg" of F1 score outputs from your eval_clasifier.py code? Am I correct?
  2. also, "Frame accuracy" represents "eval Image Exact Match Acc" outputs from your eval_classifier.py code?
  3. are BLEU 2 and BLEU 3 scores scaled by 100? I have tested your translate.py code with my generated images, and I've got about 0.04-ish scores. are the BLEU scores you reported multiplied by 100?
  4. Lastly, It is unclear about the R-precision evaluation method. Do I require to train your code (H-DAMSM)? if so, when is the right time to stop the training and benchmark my model?
  5. To fair comparison, is it possible to be provided your H-DAMSM pretrained weight?

I am currently stuck on the R-precision evaluation using H-DAMSM. So, I was thinking of utilizing the recent CLIP R-Precision instead, but I am leaving this issue to avoid a fair comparison issue.