Tencent / tencent-ml-images

Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
Other
3.06k stars 515 forks source link

Category mapping from Open Images to WordNet #36

Open soskek opened 5 years ago

soskek commented 5 years ago

Thank you for the great work! The paper says

We firstly map the categories from both ImageNet and Open Images to the WordIDs in WordNet. According to the WordIDs, we construct the semantic hierarchy among these 11,166 categories

How did you make the mapping? And, is the mapping list available in this repository?

wubaoyuan commented 5 years ago

@soskek Thanks for your interests. The general process of the mapping is: 1) Remove some rare tags, i.e., the number of corresponding images is very small; remove some visually vague tags (manually), such as "event, summer" 2) Search each remained tag from ImageNet or Open Images in WordNet, and obtain their ID (like hair-n01900150). However, this step is very time-consuming, as there may be multiple meanings/IDs for one tag. You have to check the corresponding images and pick one ID for this tag. 3) According to the obtained ID, construct the semantic hierarchy, and merge synonymous IDs into one unique ID.

Generally, this process is very time-consuming. You can find the mapping between the ID and the tag from the file "data/dictionary_and_semantic_hierarchy.txt". Hope it helps.

soskek commented 5 years ago

I see. Then, the raw original tags in Open Images or ImageNet are written in the column "category name" of the corresponding row in the file. Thank you for the quick response!

soskek commented 5 years ago

I'm still confused about how to read actual alignment from OpenImages to this dataset (or WordNet synset) from the mapping file, "data/dictionary_and_semantic_hierarchy.txt".

We can see OpenImages labels in https://storage.googleapis.com/openimages/v5/class-descriptions.csv For example, OpenImages has category /m/052sf,Mushroom. Then, the category name in OpenImages should be mushroom (we have to lowercase many categories). After that, we can see lines with mushroom strings in the mapping file as follows:

118     n07734744       34      mushroom
822     n07734879       792     stuffed mushroom
8265    n01917882       8262    mushroom coral
9208    n13001930       5178    shiitake, shiitake mushroom, Chinese black mushroom, golden oak mushroom, Oriental black mushroom, Lentinus edodes
9245    n13049953       9232    polypore, pore fungus, pore mushroom
9247    n12997919       9232    mushroom
9251    n13001041       9246    mushroom
9252    n13005984       9246    inky cap, inky-cap mushroom, Coprinus atramentarius
9253    n13000891       9246    mushroom

Even with the exact match, we have more than one lines; 118, 9245, 9251, and 9253. In such cases, this is an ambiguous multi-label example? (No complete mapping exists and, if we want, should we directly refer to a human-validated file like train_urls_from_openimages.txt?)

And, as the second question, if we can see /m/01h44,Bat (Animal) in the OpenImages reference https://storage.googleapis.com/openimages/v5/class-descriptions.csv But, it cannot be matched with any lines in "data/dictionary_and_semantic_hierarchy.txt" (while it has 2353 n02806379 2344 bat), due to its "(...)". Can we know this kind of normalization which was used?