DinoMan / speech-driven-animation

949 stars 289 forks source link

Implementation of Metrics #53

Closed Aithu-Snehith closed 3 years ago

Aithu-Snehith commented 3 years ago

I am trying to implement your model. I have read your paper and would like to know more information provided related to the implementation of the metrics.

  1. Could you please provide info related to how you have implemented the metrics that are mentioned in the paper?

  2. In few metrics like SNR and SSIM, With which image will the generated frame is compared with to get the measures of metrics? Is it with the given reference image for all the generated frames?

  3. it would be really helpful if could you please provide the formulas used for the metrics or the sudo codes.

Thanks

DinoMan commented 3 years ago

PSNR and SSIM are well-established full reference metrics in image processing. They are full reference metrics which means that they compare frames to the ground truth (corresponding frames). There are likely many implementations already available in python. The formula for PSNR is described here.

To calculate SSIM I think I used this function from the scikit library.

I measure PSNR on the whole video rather than on individual frames as stated here: https://in.mathworks.com/matlabcentral/fileexchange/12455-psnr-of-yuv-videos.

The cpbd metrics also is implemented in python.

My blink detector repo is available here

I used a pretrained version of lipnet to perform lipreading.