fredzzhang / upt

[CVPR'22] Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"
https://fredzzhang.com/unary-pairwise-transformers
BSD 3-Clause "New" or "Revised" License
144 stars 26 forks source link

confused about the vcoco dataset #60

Closed ltttpku closed 1 year ago

ltttpku commented 1 year ago

There're some cool properties of VCOCO dataset you implemented: "object_to_action" gives me the list of actions for each object, i.e. {1: [0, 3, 11, 15], 2: [0, 1, 2, 3, 11], ......} "objects" return the list of objects, i.e. ['background', 'person', 'bicycle', .......] "actions" return the list of actions, i.e. ['hold obj', 'sit instr', 'ride instr', .......]

However, I'm confused about the relationships among them:

  1. Which object does the key 1 of "1: [0, 3, 11, 15]", which is the first item of object_to_action, represent?
  2. Which action does the values [0, 3, 11, 15] of "1: [0, 3, 11, 15]" represent?

According to the List of actions and objects, Actions 0, 3, 11, 15 represent hold obj, look obj, carry obj, cut obj respectively while Object 1 represent person, which appears to be weird.

fredzzhang commented 1 year ago

Hi @ltttpku,

As the name suggests, object_to_action shows the correspondence between the objects and actions. So, the key 1 is the object index and the list [0, 3, 11, 15] is the valid actions for that object.

It does seem a bit weird to have person as the potential object for cut object. But this list of correspondence was generated from the training data, which means there must be training examples for that particular combination. I will need to take a look at the dataset later.

Fred.

fredzzhang commented 1 year ago

Hi @ltttpku,

I have checked the dataset, and can confirm that all those actions do exist.

In particular, the cutting obj and person pair can happen in a surgery scene or barber's shop. You can use the dataset navigator utility as provided in the repo to check these images. The image indices are 559, 2173, 2363 and 4416.

Fred.

ltttpku commented 1 year ago

That makes sense! Closing the issue :p