apple / ml-hypersim

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Other
1.67k stars 130 forks source link

how to align labels with bounding boxes #49

Closed SyanneL closed 1 year ago

SyanneL commented 1 year ago

Hi, I have a question regarding aligning each label with its corresponding bounding boxes per scene. I found objects information under metadata_objects.csv, but I am wondering how I could align each of them with bounding boxes generated by scene_generate_images_bounding_box.py. Thanks!

mikeroberts3000 commented 1 year ago

Hi! In order to address this question, we need to understand exactly how labeling works in Hypersim.

Each scene has O low-level objects. Each low-level object has a name (stored in metadata_objects.csv), a semantic ID (stored in mesh_objects_si.hdf5), and a semantic instance ID (stored in mesh_objects_sii.hdf5). Each of these files has exactly O entries, because there is one entry per low-level object.

Each semantic ID is in the range [-1, 40], and corresponds to an NYU40 label. A semantic ID of -1 means unlabeled, and a semantic ID of 0 never occurs in our data. Multiple low-level objects can have the same semantic ID.

Each semantic instance ID is in the range [-1, S]. A semantic instance ID of -1 means unlabeled, and a semantic instance ID of 0 never occurs in our data. Multiple low-level objects can have the same semantic instance ID (e.g., each part of a chair might be a distinct low-level object, but all the parts are grouped into a single semantic instance). Each semantic instance ID has a bounding box (stored in metadata_semantic_instance_bounding_box_*.hdf5). Each of these bounding box files has exactly S+1 entries, because there is one entry per semantic instance. The 0th entry in each of these files is set to inf, since a semantic instance ID of 0 never occurs in our data.

This data schema implies that we must use a bit of fancy indexing to obtain a semantic label for each bounding box. Essentially, we must find the semantic label corresponding to each semantic instance ID. We can use the following strategy.

  1. For each semantic instance ID, we find all the low-level objects with that semantic instance ID. Now we have a set of low-level object IDs.
  2. We look up the semantic label for each low-level object ID we found in step 1. Now we have a set of semantic labels.

Most of the time, all of the semantic labels for a particular semantic instance ID will be identical, but this is not guaranteed by our data schema. In the rare case that you obtain multiple distinct semantic labels, you can just take a majority vote, or if you're being really strict, you could assume that such a bounding box doesn't have a label.

SyanneL commented 1 year ago

Thanks for your help! May I find NYU40 label under the repo?

mikeroberts3000 commented 1 year ago

Yes! We store a mapping from semantic IDs to string labels here:

https://github.com/apple/ml-hypersim/blob/main/code/cpp/tools/scene_annotation_tool/semantic_label_descs.csv

This is the same standard label mapping (and color palette) as the one defined here:

https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/data/nyud/classes.txt https://github.com/ScanNet/ScanNet/blob/master/BenchmarkScripts/util.py

OrangeSodahub commented 6 months ago

@mikeroberts3000 Thanks for your guidance for how to find semantic id for bounding boxes, and I occurred some issues that for some instances (bounding boxes) we cannot find their low-level objects.

For example, scene ai_053_004, after I load the si, sii, and metadata_semantic_instance_bounding_box_object_aligned_2d_positions.hdf5 file:

>>> si.max()
38 # which means the maximum NYU40 id stored in this scene is 38
>>> sii.max()
130 # which means the maximum instance id stored in this scene is 130
>>> box.shape
(131, 3) # which means there 130 labeled bounding boxes in this scene
>>> sii[sii==129]
array([], dtype==int64) # the issue

Like the last row of above codes, I tried to find low-level objects whose instance id is 129, but got zero.

And if their corresponding low-level objects are unlabeled (means id==-1), so I can't get the NYU40 id for the instance, right?