GT-Vision-Lab / abstract_scenes_v002

The second version of the interface for Abstract Scenes research project.
22 stars 18 forks source link

About bounding boxes and sequence images #3

Closed barisbatuhan closed 3 years ago

barisbatuhan commented 3 years ago

Hi,

In the explanation in this repository, I have seen that the center x and y coordinates of the object is shared for each clip art image. However, I did not see any bounding box related explanation. Is there any annotation available that includes the bounding boxes for each object in the scene (like [x_min, y_min, m_max, y_max])?

Additionally, I wonder if any of these images create a sequence as a group, that can together explain a short event or action?

Thank you very much!

StanislawAntol commented 3 years ago

Hi Barış,

I never did too much myself with the abstract scenes dataset, so I never needed the bounding boxes. The majority of objects (everything except humans, if I remember correctly) are single images, so you should be able to get their dimensions and compute the bounding box from there. You would probably need to factor in the "scene depth" to scale the dimensions accordingly (looking at the Python rendering code would probably be helpful). Since humans can be rotated and are also deformable (where each body part is a separate image), computing the bounding box would be a little more complicated, but should be doable.

I haven't followed any off the work that used the abstract scenes dataset (e.g., for the VQA competition), but perhaps one of those already has available code that calculates the bounding boxes.

No, none of these abstract scenes (that are part of VQA) were collected as a sequence. Previously, there was an earlier abstract scenes/clipart project that collected a short sequence of scenes: http://web.eecs.umich.edu/~fouhey//2014/dynamics/fouhey_zitnick_dynamics.pdf You can try contacting the first author (now professor): http://web.eecs.umich.edu/~fouhey/ to see if the dataset still exists (though it would not be in the same JSON format).

Best of luck,

Stan