Number of objects in CLEVRwithmasks scenes

vadimkantorov commented 3 years ago

All fields seem to be padded with zeros up to 11 objects. How to find out the true number of objects?

vadimkantorov commented 3 years ago

What are valid integer codes for color and materials? non-zero? Colors and materials are encoded as uint8 in CLEVRwithmasks and with strings in CLEVR.

vadimkantorov commented 3 years ago

An example of dump that I get:

{
'color': [0, 1, 2, 3, 1, 1, 4, 5, 0, 0, 0],
'material': [0, 1, 1, 2, 2, 1, 2, 2, 0, 0, 0], 

'shape': [0, 1, 2, 1, 1, 3, 3, 3, 0, 0, 0],
'size': [0, 1, 1, 2, 2, 2, 2, 1, 0, 0, 0], 
'visibility': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0], 

'pixel_coords': [[0.0, 0.0, 0.0], [216.0, 92.0, 11.397212982177734], [184.0, 127.0, 9.41761589050293], [116.0, 81.0, 13.153035163879395], [51.0, 121.0, 10.44654655456543], [123.0, 129.0, 10.018261909484863], [36.0, 109.0, 11.129423141479492], [160.0, 176.0, 7.559253692626953], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 'rotation': [0.0, 206.87570190429688, 158.63943481445312, 330.7286071777344, 31.30453109741211, 198.59092712402344, 2.6792359352111816, 243.39840698242188, 0.0, 0.0, 0.0],
'x': [0.0, 0.7548010349273682, 1.6617735624313354, -2.911912679672241, -1.656102180480957, 0.1737128645181656, -2.6883020401000977, 2.8262786865234375, 0.0, 0.0, 0.0], 
'y': [0.0, 1.7722225189208984, -0.6132868528366089, 0.35031434893608093, -2.893202543258667, -1.555863618850708, -2.886444568634033, -2.4974899291992188, 0.0, 0.0, 0.0], 
'z': [0.0, 0.699999988079071, 0.699999988079071, 0.3499999940395355, 0.3499999940395355, 0.3499999940395355, 0.3499999940395355, 0.699999988079071, 0.0, 0.0, 0.0]
}

As you can see, the first object seems to have zeros everywhere, but visibility is 1.0 :/

rishabhkabra commented 3 years ago

Hi Vadim,

The first object has all-zero attributes (color, material, shape, and size) because it represents the background. As you may have observed, the first segmentation mask (for any scene) contains the background pixels.

The mapping from integers to words for CLEVR features is as follows:

{
  "material": {"metal": 2, "rubber": 1}, 
  "size": {"large": 1, "small": 2}, 
  "color": {"cyan": 2, "red": 1, "brown": 5, "gray": 6, "purple": 7, "yellow": 8, "blue": 4, "green": 3}, 
  "shape": {"cube": 3, "sphere": 1, "cylinder": 2}
}

And you can find the number of visible objects in any scene using the visibility vector. Note that it codes both the background and foreground objects as 1.0.

Hope this helps, Rish

vadimkantorov commented 3 years ago

Thanks! It would help adding these to README!

I've got another question: how were train/test splits done? (for both CLEVR6 and CLEVR10) Could you provide the file lists?

rishabhkabra commented 3 years ago

In Multi-Object Representation Learning with Iterative Variational Inference, we used only images containing 3-6 visible foreground objects (inclusive range) to train our model i.e. CLEVR6. We then assessed the model's generalization to the full dataset (where scenes could contain up to 10 objects).

You can construct the train split from CLEVR (with masks) by writing a filtering function which returns True when sum(visibility) <= 7. Sorry it won't be possible to provide an exact file list.

vadimkantorov commented 3 years ago

Do I understand correctly that sum(visibility) <= 7 was train and sum(visibility) > 7 - test? Or did test also contain some (or all?) of sum(visibility) <= 7 images?

How many images were in train/test?

Basically, I'm trying to figure out the object discovery evaluation setup for Slot Attention which I think matched your setup (https://github.com/google-research/google-research/issues/595)

Thank you!

rishabhkabra commented 3 years ago

CLEVR6 := sum(visibility) <= 7, whereas CLEVR10 was the whole dataset (any number of visible objects). That should reflect the terminology in the Slot Attention paper.

vadimkantorov commented 3 years ago

Does test split ensure it doesn't intersect too much with train? E.g. are train images excluded? Are there any filtering wrt object properties? Do you have somewhere still sizes of train/test? maybe in comments inside the arxiv submission? :)

vadimkantorov commented 3 years ago

Is it true that:

first 70k examples are used for train (and further filtered to contain <=6 objects). all of them are used in training
remaining 30k examples are used for test (and further filtered to contain <=6 objects or <= 10 objects). from the filtered 320 examples are sampled uniformly

?

google-deepmind / multi_object_datasets

Number of objects in CLEVRwithmasks scenes #9