chuangg / CLEVRER

PyTorch implementation of ICLR 2020 paper "CLEVRER: CoLlision Events for Video REpresentation and Reasoning"
108 stars 25 forks source link

Mapping from 3D object coordinates to 2D pixel coordinates #7

Open Natithan opened 2 years ago

Natithan commented 2 years ago

Hi,

I want to find the function that maps a 3D object coordinate as might be found in the annotation (e.g. in CLEVRER/train/annotation_train/annotation_00000-01000/annotation_00000.json['motion_trajectory'][0]['objects'][0]['location'], which looks like e.g. (coord_x, coord_y, coord_z) = (1.3234, 2.7147, 0.2) ) to a (pix_height,pix_width) in the 320x480 pixel output image.

I can approximate it roughly linearly with

       pix_width= int(coord_y * 80 + 240)
       pix_height= int(coord_x * 34 + 160)

but it seems the relation isn't exactly linear: image

Hence, I was wondering if there is some ground-truth mapping I overlooked somewhere which could get me the exact mapping :)

Thanks!

Natithan commented 2 years ago

As a follow up question: The readme mentions that visual masks can be found here. I was thinking I could use visual mask annotations as a way to get object-center-pixel-locations. When I extract the tar gz at that link, I get a list of json files for each video, where each json looks like this: image I couldn't figure out where the mask information is stored here; I'm thinking it's maybe in the 'counts' field with the long random string value (yellow highlight), but I'm not sure how to decode that string. Could you help with this? :)