EscVM / OIDv4_ToolKit

Download and visualize single or multiple classes from the huge Open Images v4 dataset
GNU General Public License v3.0
800 stars 633 forks source link

Multi-word class names #57

Open mikkleini opened 4 years ago

mikkleini commented 4 years ago

When label files are created it would be nice if multi-word class names like "adhesive tape" and "brown bear" be in quotes or space gets replaced with underscore. Otherwise it's little bit problematic to process those files.

PS. The readme suggests using underscore in classes file but such classes (e.g. adhesive_tape) aren't found. But it happily accepts natural names so i don't know what's going on.

shuuse commented 4 years ago

@mikkleini The underscore works for me, so probably it's your capitalization that is missing. Did you try python main.py downloader --classes Adhesive_tape --type_csv validation Seems to work. Cheers / Simen

mikkleini commented 4 years ago

Casing helps, but not completely.

This works: python main.py downloader --classes Adhesive_tape ....

This works: python main.py downloader --classes classes.txt .... classes.txt contains: Adhesive tape

This doesn't work: python main.py downloader --classes classes.txt .... classes.txt contains: Adhesive_tape

Nevertheless, what i was asking for is a feature to produce label files also with underscore. Instead of: Adhesive tape 81.27999877929688 45.485349521040916 843.52001953125 660.4991886019707 Get: Adhesive_tape 81.27999877929688 45.485349521040916 843.52001953125 660.4991886019707

Maybe do that replacement only if the input class was also with underscrore...

Why i ask - if you do another program what read those label files you can't use scanf because it tries to match the words from left. My solution was to split string and parse words from right - first comes 4 coordinates and then everything else is a class name in reverse. But you know, it's just a little bit messy...

virusapex commented 4 years ago

Casing helps, but not completely.

This works: python main.py downloader --classes Adhesive_tape ....

This works: python main.py downloader --classes classes.txt .... classes.txt contains: Adhesive tape

This doesn't work: python main.py downloader --classes classes.txt .... classes.txt contains: Adhesive_tape

Nevertheless, what i was asking for is a feature to produce label files also with underscore. Instead of: Adhesive tape 81.27999877929688 45.485349521040916 843.52001953125 660.4991886019707 Get: Adhesive_tape 81.27999877929688 45.485349521040916 843.52001953125 660.4991886019707

Maybe do that replacement only if the input class was also with underscrore...

Why i ask - if you do another program what read those label files you can't use scanf because it tries to match the words from left. My solution was to split string and parse words from right - first comes 4 coordinates and then everything else is a class name in reverse. But you know, it's just a little bit messy...

Hiya,

In case if people are still interested in solving the problem of parsing the multi-word classes in an easier manner, there is a solution, in which you just split the string by using .rsplit(' ', 4). Thus, you get your four coordinates and the word is still intact, since it's not touched. I used it in MMDetection toolbox personally.