Training improvements - Githubissues

Tobias-Fischer / rt_gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments

http://www.imperial.ac.uk/personal-robotics

Other

368 stars 68 forks source link

Training improvements #94

Closed ahmed-alhindawi closed 3 years ago

ahmed-alhindawi commented 3 years ago

Hello, This is somewhat of a small cumulative improvement and bug fixes pull request that haven't really had a dedicated place commit them individually.

The pytorch version of RT-GENE contained a bug where most images were blurred (a > symbol rather than a < symbol).
Minor improvements to training and cohesiveness of the pytorch versions
Based on #93, Pytorch now uses the same sizes as the tensorflow versions, for both gaze and blink detection making it faster at inference time.
For similar inference time improvements, VGG16 is the default for gaze and blink.

I've added the required pytorch models having trained them recently for the correct size and architecture.

Let me know what you think.

Tobias-Fischer commented 3 years ago

Hi @ahmed-alhindawi, many thanks for that! One question - what happens for users that have the previous version of rt_gene on their machine (including the old pytorch models for the larger input sizes) and then update? Will it break their installation? We should avoid this scenario. ~~Also, can we clean up the model zoo - we're downloading lots and lots of models now, do we still all need them? (I think this is ok)~~

/cc @VytasMule

ahmed-alhindawi commented 3 years ago

what happens for users that have the previous version of rt_gene on their machine (including the old pytorch models for the larger input sizes) and then update? Will it break their installation? We should avoid this scenario.

I'm not entirely sure how to handle this scenario to be honest - one could argue that the larger input sizes of the pytorch implementation deviates from the tensorflow/paper version and hence shouldn't be used. In my experience, as the resolution of the eye patches in the training data is quite low owing to the distance between the sensor and the subject, there isn't an advantage to the higher input sizes - just adds more overhead.

I'd be interested to know how you would tackle it?

Tobias-Fischer commented 3 years ago

Sorry, let me rephrase. Assume a user is running pytorch RT-GENE currently from the master branch. After merging this PR and pulling from master, will this user need to do any manual changes to continue using RT-GENE, or will everything just magically continue to work, including downloading and using the new models? Just want to make sure we're not breaking user code.

ahmed-alhindawi commented 3 years ago

Ah I see, sorry for misunderstanding.

You're right, this scenario breaks users code silently, The group of users who have already downloaded the repository from the main branch and then re-pull will get the new code and the new models. However, if they've modified their estimate_gaze.launch file and estimate_blink.launch files to take the old model files, RT-G/BENE will not work as intended - i.e. the results will be wrong and there won't be an error.

Perhaps I can add a function that checks the md5 hashes of the loaded models are correct? Or can you think of an alternative way?

Tobias-Fischer commented 3 years ago

Hmm. If there is an Exception thrown (it should be, shouldn't it?), can we catch that one instead and provide a useful error message that it is likely that a model is being loaded that is not compatible anymore? The problem with md5 hashes is that they wouldn't allow users to train their own models ..

ahmed-alhindawi commented 3 years ago

No there isn't an exception - the convolutional/pooling layers reduce the size prior to the linear layers so regardless of the input size (within reason), it will output something....it's just that if the model is strained on 224x224 and a 36x60 is inputted, a result comes out that's wrong - users will then get the impression that RT-GENE's performance is rubbish.

WRT to the md5 hash, we can have a default list as a parameter which can be over-ridden by user's hashes when they train their models; something like:

def _check_model_hashes(models, hashes=(1234, 3455)):
    _model_hashes = [md5sum(model) for model in models]
    _correct = [1 for hash in _model_hashes if hash in hashes]
    return sum(_correct) == len(_model_hashes)

What do you think?

Tobias-Fischer commented 3 years ago

Yeah, that's a good idea re the hashes then. Let's add this and a meaningful error message when the hashes don't match. Also for the next release let's make sure to clearly say that there is a breaking change.

ahmed-alhindawi commented 3 years ago

Slightly modified from the above pseudo-code but now works. I've tested it and it works as advertised... Let me know what you think.

Tobias-Fischer commented 3 years ago

Beautiful. Many thanks for this @ahmed-alhindawi :)