Tobias-Fischer / rt_gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments
http://www.imperial.ac.uk/personal-robotics
Other
365 stars 68 forks source link

Incorporate blink estimation #35

Closed Tobias-Fischer closed 4 years ago

Tobias-Fischer commented 4 years ago

We will present a method for blink estimation based on the RT-GENE dataset at the ICCV2019 workshops. We'll need to clean the code and merge it to this repo.

Tobias-Fischer commented 4 years ago

Most of the code (training+inference in ROS and as standalone) as well as the dataset/labels are now merged into master.

Items left to do:

Tobias-Fischer commented 4 years ago

@ngageorange - did you have a chance to test the MobileNet models? Hopefully they will be faster than the DenseNet ones, although I still do not understand why they are so slow in ROS but run faster in our test code. If the MobileNet models work fine I can upload them.

ahmed-alhindawi commented 4 years ago

I just had a play with the mobilenet models - do they require a different threshold? The ensemble of the three folds doesn't seem to work; it never reports that I'm blinking.

They are however much faster, ~22Hz rather than 10Hz...

EDIT: Even in the standalone version, if one was to loop the BlinkEstimatorImagePair test over the same image, one would get ~10Hz. It seems that the call to predict() itself is costly. However, inference time per image is super fast - the BlinkEstimatorFolderPair test completes 5593 pairs in 18 seconds (3ms per image pair!!!). Any thoughts as to why that might be?

Tobias-Fischer commented 4 years ago

@Twarz: It could be great if you can investigate why the MobileNet's don't perform well. Did you use the use_weight_balancing flag when training these models? If not, I would not be surprised why it does not work. use_weight_balancing should always be enabled - in the upcoming pull request, can you remove the flag and assume it is enabled?

@ngageorange @Twarz : 22Hz is still quite slow for MobileNet, considering that VGG for the eye gaze runs at 30Hz. I have no idea why this is the case. @Twarz do you have any ideas? Could you time just the time it takes for the forward pass (call to tf.keras.predict)?

KevinCortacero commented 4 years ago

@Tobias-Fischer , @ngageorange I give a try this afternoon, the difference between BlinkEstimatorImagePair and BlinkEstimatorFolderPair is weird :weary:

Tobias-Fischer commented 4 years ago

The speed issue is fixed in Tensorflow 2.1.0. See https://github.com/tensorflow/tensorflow/issues/31975 for the background issue. The DenseNet runs at 35 Hz and the MobileNet at 100 Hz on my machine (1080 Ti).

@Twarz: Can you please retrain some MobileNet + ResNet + DenseNet models with the latest code (run git pull --rebase)? It has now weight balancing enabled by default.

@ngageorange: Please check whether the speed issue is resolved for you using the newest TF version.

ahmed-alhindawi commented 4 years ago

I'm afraid I am on Tensorflow 2.1.0 and the speed issue remains. I'm on the Tesla V. I've tried it on a fresh python3.7 virtualenv with the required packages, still slow..

I'll try on my laptop with a 1080....

KevinCortacero commented 4 years ago

I got the same estimate as @ngageorange on Icubunicorn (1060 Ti?)

@Tobias-Fischer , @ngageorange new models are trained, I put them on Juicer

Tobias-Fischer commented 4 years ago

So here is my output (using blink_model_1):

(rtgene3) tobias@QUTLAB:~/catkin_ws_rtgene/src/rt_gene$ ./rt_gene/scripts/estimate_blink_standalone.py --left ~/Downloads/tobias.jpg --right ~/Downloads/tobias.jpg --vis_blink=False
2020-02-26 07:22:23.742697: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-02-26 07:22:23.766889: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4200000000 Hz
2020-02-26 07:22:23.767021: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d568495700 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-26 07:22:23.767034: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-02-26 07:22:23.767628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-26 07:22:23.782759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:23.783034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2020-02-26 07:22:23.783164: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-26 07:22:23.784290: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-26 07:22:23.785359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-26 07:22:23.785536: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-26 07:22:23.786693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-26 07:22:23.787313: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-26 07:22:23.789695: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-26 07:22:23.789786: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:23.790077: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:23.790294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-26 07:22:23.790323: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-26 07:22:23.834254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-26 07:22:23.834277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-02-26 07:22:23.834283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-02-26 07:22:23.834408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:23.834722: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:23.835009: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:23.835262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2434 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-02-26 07:22:23.836449: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d5691d4a10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-26 07:22:23.836461: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
WARNING:tensorflow:From /home/tobias/catkin_ws_rtgene/src/rt_gene/rt_gene/src/rt_bene/estimate_blink_base.py:29: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

Load model /home/tobias/catkin_ws_rtgene/src/rt_gene/rt_gene/scripts/../model_nets/blink_model_1.h5
WARNING:tensorflow:From /home/tobias/anaconda3/envs/rtgene3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-02-26 07:22:24.003227: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:24.003550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2020-02-26 07:22:24.003591: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-26 07:22:24.003603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-26 07:22:24.003613: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-26 07:22:24.003623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-26 07:22:24.003633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-26 07:22:24.003642: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-26 07:22:24.003652: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-26 07:22:24.003695: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:24.003981: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-26 07:22:24.004229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
WARNING:tensorflow:Large dropout rate: 0.6 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
WARNING:tensorflow:Large dropout rate: 0.6 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
WARNING:tensorflow:Large dropout rate: 0.6 (>0.5). In TensorFlow 2.x, dropout() uses dropout rate instead of keep_prob. Please ensure that this is intended.
Loaded model /home/tobias/catkin_ws_rtgene/src/rt_gene/rt_gene/scripts/../model_nets/blink_model_1.h5
2020-02-26 07:22:46.356763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-26 07:22:46.455458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Loaded 1 model(s)
Ready
Frequency: 34.32
(array([[0.04294029]], dtype=float32), array([[False]]))

My conda environment (you can load it via conda env create -f environment.yml):

(rtgene3) tobias@QUTLAB:~/catkin_ws_rtgene/src/rt_gene$ conda env export
name: rtgene3
channels:
  - pytorch
  - defaults
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=main
  - _tflow_select=2.1.0=gpu
  - absl-py=0.9.0=py37_0
  - asn1crypto=1.3.0=py37_0
  - astor=0.8.0=py37_0
  - backcall=0.1.0=py37_0
  - blas=1.0=mkl
  - blinker=1.4=py37_0
  - bzip2=1.0.8=h7b6447c_0
  - c-ares=1.15.0=h7b6447c_1001
  - ca-certificates=2020.1.1=0
  - cachetools=3.1.1=py_0
  - cairo=1.14.12=h8948797_3
  - certifi=2019.11.28=py37_0
  - cffi=1.14.0=py37h2e261b9_0
  - chardet=3.0.4=py37_1003
  - click=7.0=py37_0
  - cryptography=2.8=py37h1ba5d50_0
  - cudatoolkit=10.1.243=h6bb024c_0
  - cudnn=7.6.5=cuda10.1_0
  - cupti=10.1.168=0
  - cycler=0.10.0=py37_0
  - dbus=1.13.12=h746ee38_0
  - decorator=4.4.1=py_0
  - expat=2.2.6=he6710b0_0
  - ffmpeg=4.0=hcdf2ecd_0
  - fontconfig=2.13.0=h9420a91_0
  - freeglut=3.0.0=hf484d3e_5
  - freetype=2.9.1=h8a8886c_1
  - gast=0.2.2=py37_0
  - glib=2.63.1=h5a9c865_0
  - google-auth=1.11.2=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.1.8=py_0
  - graphite2=1.3.13=h23475e2_0
  - grpcio=1.27.2=py37hf8bcb03_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=hb453b48_1
  - h5py=2.8.0=py37h989c5e5_3
  - harfbuzz=1.8.8=hffaf4a1_0
  - hdf5=1.10.2=hba1933b_1
  - icu=58.2=h9c2bf20_1
  - idna=2.8=py37_0
  - intel-openmp=2020.0=166
  - ipython=7.12.0=py37h5ca1d4c_0
  - ipython_genutils=0.2.0=py37_0
  - jasper=2.0.14=h07fcdf6_1
  - jedi=0.16.0=py37_0
  - jpeg=9b=h024ee3a_2
  - keras-applications=1.0.8=py_0
  - keras-preprocessing=1.1.0=py_1
  - kiwisolver=1.1.0=py37he6710b0_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20181209=hc058e9b_0
  - libffi=3.2.1=hd88cf55_4
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libglu=9.0.0=hf484d3e_1
  - libopencv=3.4.2=hb342d67_1
  - libopus=1.3=h7b6447c_0
  - libpng=1.6.37=hbc83047_0
  - libprotobuf=3.11.4=hd408876_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - libtiff=4.1.0=h2733197_0
  - libuuid=1.0.3=h1bed415_2
  - libvpx=1.7.0=h439df22_0
  - libxcb=1.13=h1bed415_1
  - libxml2=2.9.9=hea5a465_1
  - markdown=3.1.1=py37_0
  - matplotlib=3.1.3=py37_0
  - matplotlib-base=3.1.3=py37hef1b27d_0
  - mkl=2020.0=166
  - mkl-service=2.3.0=py37he904b0f_0
  - mkl_fft=1.0.15=py37ha843d7b_0
  - mkl_random=1.1.0=py37hd6b4f25_0
  - ncurses=6.2=he6710b0_0
  - ninja=1.9.0=py37hfd86e86_0
  - numpy=1.18.1=py37h4f9e942_0
  - numpy-base=1.18.1=py37hde5b4d6_1
  - oauthlib=3.1.0=py_0
  - olefile=0.46=py37_0
  - opencv=3.4.2=py37h6fd60c2_1
  - openssl=1.1.1d=h7b6447c_4
  - opt_einsum=3.1.0=py_0
  - parso=0.6.1=py_0
  - pcre=8.43=he6710b0_0
  - pexpect=4.8.0=py37_0
  - pickleshare=0.7.5=py37_0
  - pillow=7.0.0=py37hb39fc2d_0
  - pip=20.0.2=py37_1
  - pixman=0.38.0=h7b6447c_0
  - prompt_toolkit=3.0.3=py_0
  - protobuf=3.11.4=py37he6710b0_0
  - ptyprocess=0.6.0=py37_0
  - py-opencv=3.4.2=py37hb342d67_1
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.19=py37_0
  - pygments=2.5.2=py_0
  - pyjwt=1.7.1=py37_0
  - pyopenssl=19.1.0=py37_0
  - pyparsing=2.4.6=py_0
  - pyqt=5.9.2=py37h05f1152_2
  - pysocks=1.7.1=py37_0
  - python=3.7.6=h0371630_2
  - python-dateutil=2.8.1=py_0
  - pytorch=1.4.0=py3.7_cuda10.1.243_cudnn7.6.3_0
  - qt=5.9.7=h5867ecd_1
  - readline=7.0=h7b6447c_5
  - requests=2.22.0=py37_1
  - requests-oauthlib=1.3.0=py_0
  - rsa=4.0=py_0
  - scipy=1.4.1=py37h0b6359f_0
  - setuptools=45.2.0=py37_0
  - sip=4.19.8=py37hf484d3e_0
  - six=1.14.0=py37_0
  - sqlite=3.31.1=h7b6447c_0
  - tensorboard=2.1.0=py3_0
  - tensorflow=2.1.0=gpu_py37h7a4bb67_0
  - tensorflow-base=2.1.0=gpu_py37h6c5654b_0
  - tensorflow-estimator=2.1.0=pyhd54b08b_0
  - tensorflow-gpu=2.1.0=h0d30ee6_0
  - termcolor=1.1.0=py37_1
  - tk=8.6.8=hbc83047_0
  - torchvision=0.5.0=py37_cu101
  - tornado=6.0.3=py37h7b6447c_3
  - tqdm=4.42.1=py_0
  - traitlets=4.3.3=py37_0
  - urllib3=1.25.8=py37_0
  - wcwidth=0.1.8=py_0
  - werkzeug=1.0.0=py_0
  - wheel=0.34.2=py37_0
  - wrapt=1.11.2=py37h7b6447c_0
  - xz=5.2.4=h14c3975_4
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.3.7=h0b5b093_0
Tobias-Fischer commented 4 years ago

Regarding the speed issue: we compared running the full pipeline vs just the blink estimation, hence the significant differences.

Another to do item: provide sample inputs so that the standalone version runs without any arguments.

Tobias-Fischer commented 4 years ago

Ping @Twarz - how is the evaluation code coming along?

Tobias-Fischer commented 4 years ago

Ping @Twarz - this is long overdue