Closed Jaceyxy closed 1 year ago
Hi @Jaceyxy,
The sequence labels for multiple-action videos are collected as an answer to the question “If you had to describe the whole sequence as one action, what would it be?" All raw annotation labels collected in BABEL have been mapped to ~250 motion categories. This means there is an action category for each of the smaller segments of a motion sequence containing multiple actions. The BABEL- 60 is the 60 most frequent actions in BABEL's action categories.
The format of the annotation is explained in the section "Data Format" here.
Hello, so is your babel60 generated based on sequence tags? When there are many sequence tags, which one should I choose as the sequence tag for the classification task?
Hey @Jaceyxy , BABEL-60 that we use for the action recognition challenge, is based on the "Dense" annotations. In other words, it based on the frame_ann
, yes. It is not generated based on sequence tags.
There's indeed a possibility that multiple action categories are present in the same segment (even when using frame_ann
). In such a case, we treat all the action categories as equally valid labels. Concretely, in action recognition challenge, let's say there exists a segment $s$ with categories 'act_cat':
$[l_1, l_2]$. We treat this as two separate instances when creating a training or testing dataset, i.e., $[ (s, l_1), (s, l_2) ]$.
You are right in your observation that duration of the segment can be <= 5 sec. We describe how we handle this in the paper ("Data pre-processing", page 8) , and in the response to this issue.
I hope this answers your question.
Hello, can I know how you generate babel60 datasets? Here is an example of one of your annotations
We are visualizing annotations for seq ID: 5788 in "train.json" {'babel_sid': 5788, 'dur': 6.77, 'feat_p': 'BMLrub/BioMotionLab_NTroje/rub055/0020_lifting_heavy2_poses.npz', 'frame_ann': {'anntr_id': '6e0e9098-e2a1-4019-a59b-ba1711d82c07', 'babel_lid': '450b32c4-4d51-45d4-bd4e-2884e2a626cb', 'labels': [{'act_cat': ['place something'], 'end_t': 4.023, 'proc_label': 'place', 'raw_label': 'place', 'seg_id': '0e4b04d8-9be4-4863-997b-34d40fda345f', 'start_t': 2.44}, {'act_cat': ['turn'], 'end_t': 5.127, 'proc_label': 'turn', 'raw_label': 'turn', 'seg_id': '168cb328-35c9-4501-b914-e474af7ab329', 'start_t': 4.19}, {'act_cat': ['walk'], 'end_t': 1.523, 'proc_label': 'walk', 'raw_label': 'walk', 'seg_id': '5866b366-0386-42d5-aa1a-846f47ac2372', 'start_t': 0}, {'act_cat': ['walk'], 'end_t': 6.767, 'proc_label': 'walk', 'raw_label': 'walk', 'seg_id': '48334837-6265-458d-a92e-3a00931e9450', 'start_t': 5.127}, {'act_cat': ['take/pick something up'], 'end_t': 2.44, 'proc_label': 'take', 'raw_label': 'take', 'seg_id': '247a098f-bcca-4a6b-8fc4-50508f96a55d', 'start_t': 1.752}, {'act_cat': ['transition'], 'end_t': 1.752, 'proc_label': 'transition', 'raw_label': 'transition', 'seg_id': 'fa15e7de-1966-4d80-af7a-3968b5f510c2', 'start_t': 1.523}, {'act_cat': ['transition'], 'end_t': 4.19, 'proc_label': 'transition', 'raw_label': 'transition', 'seg_id': '85752b5d-1df9-4b1a-b234-ac1a2029b992', 'start_t': 4.023}], 'mul_act': True}, 'seq_ann': {'anntr_id': '9872fc75-d3a1-4335-9c9f-c64810f48c4d', 'babel_lid': '06225ed2-2f81-4057-951b-b41c736021d3', 'labels': [{'act_cat': ['walk'], 'proc_label': 'walk', 'raw_label': 'walk', 'seg_id': '02cb19b9-1b85-4e48-a6c9-00b10d129904'}, {'act_cat': ['fill', 'lift something'], 'proc_label': 'scoop', 'raw_label': 'scoop', 'seg_id': 'ff8ee348-9917-49fb-b995-69577712dbaa'}, {'act_cat': ['place something'], 'proc_label': 'place', 'raw_label': 'place', 'seg_id': '6d5e7ca4-22a3-45d9-b6a0-1979f9de98c3'}, {'act_cat': ['turn'], 'proc_label': 'turn', 'raw_label': 'turn', 'seg_id': 'fbdcd31e-4178-401b-ab61-d4acaed80353'}, {'act_cat': ['walk'], 'proc_label': 'walk', 'raw_label': 'walk', 'seg_id': '0a45afa6-46ae-4023-80ce-acf7c74f4207'}], 'mul_act': True}, 'url': 'https://babel-renders.s3.eu-central-1.amazonaws.com/005788.mp4'} ### I want to know how you deal with this file. If you use frame_ann, then many label have a duration of "end_t-start_t\" less than 5 seconds. If you use a sequence label, you think the whole sequence is doing an action, but there are a lot of "act_cat" at this time, such as "walk,fill,liftsomething". How do you choose an "act_cat" as a motion label?