Tobias-Fischer / rt_gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments
http://www.imperial.ac.uk/personal-robotics
Other
361 stars 67 forks source link

Question regarding reducing image size from 224x224 to 36x60 #113

Closed GrantZheng86 closed 2 years ago

GrantZheng86 commented 2 years ago

Hello there,

Happy holidays to you all!

I noticed a change in data input size from 224x224 to 36x60 in this file at line 20. I am wondering if this change will cause the accuracy of the prediction to change? Also, what's the original rationale for using 224x224 images?

Thanks in advance for helping out!

Tobias-Fischer commented 2 years ago

Hi @GrantZheng86,

Happy holidays to you, too!

The input images have a resolution of 36x60. As far as I remember, @ahmed-alhindawi (who implemented the PyTorch version) told me that the backbone at some stage required a minimum resolution of 224x224, so the images originally were upsampled to 224x224. This minimum resolution does not seem to be required anymore.

As the original images only had 36x60 resolution anyhow, the predictions should not change (significantly) as the information content is the same.

Hope that helps!

Best, Tobi

GrantZheng86 commented 2 years ago

Thanks a lot for your quick response! That makes sense, the original VGG, without any modification, does require 224 image size.