Closed itzsid closed 1 year ago
For problem 1: pose_train.csv
has 487 samples as compared to 758 samples in video_train.csv
I explained it in my paper, in Sec 3:
This is because the pose-level method does not need to predict the number of repetitions during training, but only completes the mapping between salient poses and actions, so we do not need to capture every action event in the training set, but choose high-quality actions. In this regard, the cost of annotation will also be less than video-level methods.
To sum up, it is also ok to annotate all the actions that appear in the video, but it is not necessary.
We only use the keyframes where the salient poses are located for training, so even using only a part of the actions is enough to train the network to learn this mapping. (For example, a video has 10 actions, but maybe only 6 actions are enough for training)
In the testing stage, we will of course use all actions from the test set for fair comparison.
For problem 2: different categories
Strictly speaking, the categories of pose_train.csv
and video_train.csv
are the same. That is to say, front_raise
and frontraise
are the same, jump_jack
and jump_jacks
are the same, pull_up
and pullups
are the same, etc.
The above is the explanation of the authors of the RepCount dataset. When they annotated, there were many people working together, so some distinctions were made in the category naming.
When I annotated pose_train.csv
, I just merged these categories.
Thanks for the quick reply @MiracleDance. How do you deal with others
category which is part of test?
In the test set, only one sample is the others
category, which is not representative, so it can be removed directly. Or just keep it, it would not be recognized as one of the eight categories we are dealing with.
In addition, there are samples belonging to the category battle_rope
in the training set, but there are no samples of this category in the test set. Maybe these are some small flaws of the RepCount dataset, even though this dataset is still very good. Therefore, the re-annotating of pose_train.csv
still has to be based on the overall information of the dataset.
This task has not been extensively explored, maybe a larger, more comprehensive and complete dataset would be better.
In general, for PoseRAC to solve the task of RepCount, the categories to be processed are:
front_raise, pull_up, push_up, jump_jack, pommelhorse, squat, situp, bench_pressing
This makes sense. Thanks @MiracleDance.
Apologies for re-opening this ticket. I have another question regarding the difference between pose_train.csv
and video_train.csv
:
If I understand correctly, L1, L2....
refer to the start and end locations. Is that correct? For the same file (example show below), the values of L1, L2 are different. Why is there a difference?
In pose_train.csv
:
175,situp,stu1_64.mp4,200,236,301,333,356,392,426,456,488,531,577,605,636,666,698,738,777,811,850,889,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
In video_train.csv
:
254,situp,stu1_64.mp4,9,278,321,321,390,390,460,460,528,528,599,599,666,666,734,734,807,807,886,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Oh......the annotations of pose_train.csv
and video_train.csv
mean different locations.
For specific concepts, maybe you can refer to my paper.
I think Figure 2 in my paper can already accurately answer your question. If you have further questions, please leave a message!
Here is the Figure 2:
Got it, thanks!
Hi,
I'm trying to understand the dataset and I'm wondering what is the difference between
pose_train.csv
andvideo_train.csv
.pose_train.csv
has 487 samples as compared to 758 samples invideo_train.csv
. Also,pose_train
has these categories:video_train
has these categories:So, I wonder how the training subset in
pose_train
selected?