VIPL-SLP / pointlstm-gesture-recognition-pytorch

This repo holds the codes of paper: An Efficient PointLSTM for Point Clouds Based Gesture Recognition (CVPR 2020).
https://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
Apache License 2.0
117 stars 19 forks source link

Experimental Protocol for MSRAction3d #8

Closed alexbgl closed 3 years ago

alexbgl commented 3 years ago

Hi @Blueprintf,

I am currently trying to repeat your experiments on the MSR Action 3D dataset and I am not sure if I understood the experimental protocal correctly.

1) In reference [17] in your paper (Action recognition based on a bag of 3d points), the authors split the dataset into 3 subsets according to categories and consider each split separately, i.e. they train on a portion of subset x and evaluate on the remaining part of the same subset. Did you use the same splitting in your experiments or did you consider the dataset all in one?

2) I understand your splitting by subjects as follows: You randomly split the dataset into 2 groups of 5 subjects. Then you first train on one of these groups and evaluate on the other, and vice versa. This procedure is repeated 5 times such that you get an overall of 10 scores. Did I get that right?

3) In your paper, you say that you removed 10 noisy sequences from the dataset. Which sequences did you remove? I couldn't find it in the referenced papers.

It would be a great help if you could elaborate on that.

Thanks

ycmin95 commented 3 years ago

Hello @alexbgl, thanks for your attention,

  1. We didn't split the dataset into subsets and just considered the dataset all in one.

  2. Yes, this is correct, the corresponding subject splits are [ ([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]), ([1, 3, 5, 7, 9], [2, 4, 6, 8, 10]), ([1, 4, 7, 10, 3], [2, 5, 6, 8, 9]), ([1, 5, 9, 3, 7], [2, 4, 6, 8, 10]), ([1, 6, 2, 7, 3], [4, 5, 8, 9, 10]) ]. We just find a bug that the second group is the same as the fourth group (because we adopt (subject_idx+offset)%10 to generate the splits), and the corresponding results of PointLSTM-late are [ 93.13, 87.59, 97.07, 89.08, 92.78, 94.36, 97.07, 89.08, 91.11, 91.64], and the corrected accuracy is 92.10 \pm 2.78.

  3. We follow the codes of "Mining actionlet ensemble for action recognition with depth cameras", you can find the details in this repo. The removed sequences are [[3, 2, 2], [3, 4, 1], [4, 7, 1], [9, 13, 1], [9, 13, 2], [9, 13, 3], [3, 14, 1], [7, 20, 1], [7, 20, 3], [10, 20, 3]], but there is no performance after removing sequences.

alexbgl commented 3 years ago

Thanks!