AI4Bharat / OpenHands

👐OpenHands : Making Sign Language Recognition Accessible. | **NOTE:** No longer actively maintained. If you are interested to own this and take it forward, please raise an issue
https://openhands.readthedocs.io
Apache License 2.0
97 stars 15 forks source link

What is "Temporal Attention" as used alongside the RNN in your paper? #52

Open argadewanata opened 3 months ago

argadewanata commented 3 months ago

I read your paper titled "OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages" on https://arxiv.org/abs/2110.05877.

On page 4, you state: "For the RNN model, we use a 4-layered bidirectional LSTM with a hidden layer dimension of 128, which takes as input the frame-wise pose representation of 27 keypoints with 2 coordinates each, resulting in a vector of 54 points per frame. We also use a temporal attention layer to weight the most effective frames for classification."

However, I couldn't find a definition of "temporal attention" as used in your method. Could you please explain it?