Tobias-Fischer / rt_gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments
http://www.imperial.ac.uk/personal-robotics
Other
365 stars 68 forks source link

Pytorch training #63

Closed ahmed-alhindawi closed 4 years ago

ahmed-alhindawi commented 4 years ago

Good evening all,

Welcome to amateur rebasing hour. In an effort to keep pytorch_training branch up to date with the recent pull requests, I've made several rebases/commits/pushes and I have no idea how they've all stacked up so high, but the end result is that they all work (i think by chance!)

Either way; this branch does several things:

I have tested this code and it runs both the tensorflow and the pytorch paths correctly. Let me know what you think :)

Fix #46

ahmed-alhindawi commented 4 years ago

Some notes on model inference across different backends. This is purely for inference on the model - i.e. running the same patches/headpose that are already on the GPU over 5000 instances and averaging the frequency of model inference. Memory usage is from nvidia-smi. This test can be seen in the gaze_estimation_models_pytorch.py

Backend Frequency Memory usage (MiB)
Resnet-50 70Hz 1591
ResneXt-50 45Hz 1651
Resnet-18 160Hz 1219
VGG-16 175Hz 1857
MobilenetV2 75Hz 1259
MNAS 80Hz 1233
Shufflenet 60Hz 1119

Seems that VGG-16/Resnet-18 are quite equal in terms of inference time but Resnet-18 has lower usage. Will update on accuracy following some more training

Tobias-Fischer commented 4 years ago

Some more remarks:

Other questions:

ahmed-alhindawi commented 4 years ago

Some more remarks:

"Generate Left/Eye right patches using our new pipeline with new face detector/new eye patches into the inpainted folder of the rt_gene dataset (called left_new and right_new)" -> I think this again needs documentation; are these patches then used for training RT-GENE?

Yes, they are - there is a possibility of merging the H5 dataset generation with this but it would be convoluted and not very modular. The GenerateEyePatchesDataset.py uses the new face detector and landmark extractor to extract the eye patches into left_new and right_new per inpainted subject. Those two folders per subject, are then used alongside the label_combined.txt to generate the H5 dataset using GenerateRTGeneH5Dataset.py The reason I did this is because we lose some data with the new patch extraction technique - around 0.5 - 1% of data; i.e. the new pipeline doesn't think there is an eye patch there but the old pipeline did. I wanted to give the user/trainer an option of using the new pipeline dataset that has fewer samples or the older dataset that has more samples. I've documented the stages required to get the training underway in the README.md

Is the inference now faster when running the gaze estimation and blink estimation at the same time?

Not sure yet, still working on the models. Getting VGG-16 to train takes a long time compared to Resnet...

In the future, do you plan a PyTorch version of RT-BENE?

Yes.

Can the training be run on a "normal" GPU? What is the minimum requirement? Would it train on something like a 1070?

Oh yes, it trains fine, just with a smaller batch size that's all.

When using the PyTorch backend, can we get rid of the tensorflow dependency?

Yes, the pipeline (besides the blink estimation) wouldn't require tensorflow and thus can be removed as a dependency.

ahmed-alhindawi commented 4 years ago

Before merging, I think we should briefly mention in the appropriate README files that there are two ways of doing training/inference now.

Agreed but can we hold off until I have the models fully trained and in deployable storage? I don't want a user to think they can run on pytorch and then not have any trained models

Tobias-Fischer commented 4 years ago

Looks pretty much ready to merge now. I agree that it's best to wait until the models are trained. Many thanks again!

Tobias-Fischer commented 4 years ago

Ahhh one thing: Do you have a script that does k-fold evaluation, too? Something equivalent to https://github.com/Tobias-Fischer/rt_gene/blob/pytorch_training/rt_gene_model_training/tensorflow/evaluate_model.py?

ahmed-alhindawi commented 4 years ago

Ahhh one thing: Do you have a script that does k-fold evaluation, too? Something equivalent to https://github.com/Tobias-Fischer/rt_gene/blob/pytorch_training/rt_gene_model_training/tensorflow/evaluate_model.py?

Nope - will create one as soon as I can.

Tobias-Fischer commented 4 years ago

This PR fixes #46

ahmed-alhindawi commented 4 years ago

Sorry it's taken me a while; each model takes several days to train but now we have 4 models (VGG) and thus the pytorch branch is now at feature parity with the tensorflow.

Tobias-Fischer commented 4 years ago

Finally merged - many thanks @ahmed-alhindawi! Great work.