Two Stream CNN is proposed in SKELETON-BASED ACTION RECOGNITION WITH CONVOLUTIONAL NEURAL NETWORKS, which is used for skeleton-based action recognition. It maps a skeleton sequence to an image( coordinates x,y,z to image R,G,B ). And they specially designed skeleton transformer module to rearrange and select important skeleton joints automatically.
The network mainly consists of four modules which are Skeleton Transformer
, ConvNet
, Feature Fusion
and Classification
. The inputs of two stream are raw data(x, y, z) and frame difference respectively. As show below :
layers/transformer : the layer of Skeleton Transformer implement in Keras
network/ : the fold has four flies with different feature fusion way
model | accuracy(cs) |
---|---|
base line | 83.2% |
my model | 80.7% |
Introduce attention mechanism
to Skeleton Transformer module. Then, the accurancy can reach at 82.1%.
If you have any questions, please feel free to contact me.
Duohan Liang (duohanl@outlook.com)