No further work will be done on this project.
This repository contains a variety of tools to build up a experimental ecosystem for recognizing signs of the german sign language (DGS). Our claim is an experimental attempt at live subtitling of gestures. For this we train a deep learning model (RNN) for predicting the actual signs made by a person filmed. Therefore we use MediaPipe, a framework for building ML pipelines, to extract face and hand positions, including multiple coordinates for each finger.
The following table lists all supported words with a link to SignDict and the english translation.
German | English |
---|---|
Bier | Beer |
Computer | Computer |
Deutschland | Germany |
du | you |
essen | (to) eat |
Foto | Photo |
Fußball | Soccer |
gestern | yesterday |
haben | (to) have |
Hallo | Hello |
Haus | House |
heute | today |
Hose | Pants |
Hunger | Hunger |
ich | Me |
Land | Country |
lernen | (to) learn |
lieben | (to) love |
Mainz | Mainz (city in germany) |
morgen | tomorrow |
rennen | (to) run |
Software | Software |
Sonne | Sun |
Tag | Day |
trinken | (to) drink |
Universität | University |
unser | our(s) |
Welt | World |
Wetter | Weather |
zeigen | (to) show |
jupyter notebooks
, we recommend to install Anaconda.TensorFlow 2.2.0
with conda
, see https://anaconda.org/anaconda/tensorflow-gpucd src/
./build_prediction.sh
./run_prediction.sh
For training we need many videos for each sign, we want to predict. Those examples are generated by users of our platform Gebärdenfutter.
For extracting multi hand and face detections for each frame of the videos and saving them, we built a pipeline with MediaPipe
, e.g. have a look at the DetectionsToCSVCalculator
, we implemented. It simply writes out the detections made by MediaPipe
to CSV files.
The CSV files are used to train a deep learning model with Keras
, a high level API for TensorFlow
.
To find best hyperparameter sets we use Weights&Biases' Sweeps.
Check out the lab
folder.
Visualization of MediaPipe Graph
The trained model is used for predicting live video stream. See the SignLangRecognitionCalculator
for further details on how we try to use the model for live predictions. Currently it's not working well, like we expected before, but it provides us an infrastructure for experiments and testing. You've got ideas for improvements? Let us know!