About bounding boxes and sequence images

Hi Barış,

I never did too much myself with the abstract scenes dataset, so I never needed the bounding boxes. The majority of objects (everything except humans, if I remember correctly) are single images, so you should be able to get their dimensions and compute the bounding box from there. You would probably need to factor in the "scene depth" to scale the dimensions accordingly (looking at the Python rendering code would probably be helpful). Since humans can be rotated and are also deformable (where each body part is a separate image), computing the bounding box would be a little more complicated, but should be doable.

I haven't followed any off the work that used the abstract scenes dataset (e.g., for the VQA competition), but perhaps one of those already has available code that calculates the bounding boxes.

No, none of these abstract scenes (that are part of VQA) were collected as a sequence. Previously, there was an earlier abstract scenes/clipart project that collected a short sequence of scenes: http://web.eecs.umich.edu/~fouhey//2014/dynamics/fouhey_zitnick_dynamics.pdf You can try contacting the first author (now professor): http://web.eecs.umich.edu/~fouhey/ to see if the dataset still exists (though it would not be in the same JSON format).

Best of luck,

Stan

GT-Vision-Lab / abstract_scenes_v002

About bounding boxes and sequence images #3