VOC 2007 metric? - Githubissues

ndcuong91 commented 6 years ago

Hi @Cartucho As report here https://github.com/pjreddie/darknet/issues/956 Can we modify our tool to use VOC 2007 metric (Reject difficult objects)?

Cartucho commented 6 years ago

Yes, that's a great idea! Any recommendations/idea of how this should be implemented?

Idea 0: We could add an optional difficult on the end of each line of the ground-truth files. The ground-truth files would have the following format (the difficult objects in the ground-truth would be ignored):
```
<class_name> <left> <top> <right> <bottom> [<difficult>]
```

Example of a ground-truth file using Idea 0:

    tvmonitor 2 10 173 238 difficult
    book 439 157 556 241
    book 437 246 518 351
    pottedplant 272 190 316 259

What do you think?

ndcuong91 commented 6 years ago

@Cartucho that's an acceptable solution. Do you have any plan to implement this? Btw, i think the processing time of this tool is not good enough. When i tested on VOC 2007, it took about 15 minutes to get final result (i already turn off save image's function and also animation). But when i used AlexeyAB's tool (https://github.com/AlexeyAB/darknet#how-to-calculate-map-on-pascalvoc-2007) i need only 30s to get final mAP. So i switched to this tool now

Cartucho commented 6 years ago

@titikid Yes, I will implement it then.

Thanks for noticing that, could you please tell me the output of: python -m cProfile main.py -na -np

This way I will know what to improve.

Cartucho commented 6 years ago

@titikid I just added the difficult feature. Let me know if it works for you!

ndcuong91 commented 6 years ago

@Cartucho Great! i will try it and feedback to you

ndcuong91 commented 6 years ago

@Cartucho this is the output of python -m cProfile main.py -na -np

22.73% = backpack AP
85.94% = bed AP
17.52% = book AP
14.29% = bookcase AP
23.48% = bottle AP
31.86% = bowl AP
7.93% = cabinetry AP
53.84% = chair AP
4.55% = coffeetable AP
19.05% = countertop AP
42.50% = cup AP
39.66% = diningtable AP
0.00% = doll AP
20.69% = door AP
7.69% = heater AP
71.43% = nightstand AP
42.86% = person AP
17.71% = pictureframe AP
13.01% = pillow AP
62.31% = pottedplant AP
73.21% = remote AP
0.00% = shelf AP
16.33% = sink AP
90.48% = sofa AP
1.39% = tap AP
0.00% = tincan AP
63.25% = tvmonitor AP
18.75% = vase AP
45.45% = wastecontainer AP
23.53% = windowblind AP
mAP = 31.05% 174450 function calls (174350 primitive calls) in 0.118 seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function) 3 0.000 0.000 0.000 0.000 UserDict.py:103(contains) 9 0.000 0.000 0.000 0.000 UserDict.py:35(getitem) 3 0.000 0.000 0.000 0.000 UserDict.py:91(get) 115 0.004 0.000 0.021 0.000 init.py:122(dump) 267 0.000 0.000 0.004 0.000 init.py:193(dumps) 480 0.001 0.000 0.013 0.000 init.py:258(load) 480 0.000 0.000 0.011 0.000 init.py:294(loads) 1 0.000 0.000 0.002 0.002 init.py:99() 1 0.000 0.000 0.000 0.000 argparse.py:1000(_VersionAction) 1 0.000 0.000 0.000 0.000 argparse.py:1025(_SubParsersAction) 1 0.000 0.000 0.000 0.000 argparse.py:1027(_ChoicesPseudoAction) 1 0.000 0.000 0.000 0.000 argparse.py:1109(FileType) 1 0.000 0.000 0.000 0.000 argparse.py:112(_AttributeHolder) 1 0.000 0.000 0.000 0.000 argparse.py:1153(Namespace) 1 0.000 0.000 0.000 0.000 argparse.py:1160(init) 1 0.000 0.000 0.000 0.000 argparse.py:1180(_ActionsContainer) 3 0.000 0.000 0.000 0.000 argparse.py:1182(init) 34 0.000 0.000 0.000 0.000 argparse.py:1234(register) 12 0.000 0.000 0.000 0.000 argparse.py:1238(_registry_get) 6 0.000 0.000 0.000 0.000 argparse.py:1263(add_argument) 2 0.000 0.000 0.000 0.000 argparse.py:1310(add_argument_group) 6 0.000 0.000 0.000 0.000 argparse.py:1320(_add_action) 6 0.000 0.000 0.000 0.000 argparse.py:1400(_get_optional_kwargs) 6 0.000 0.000 0.000 0.000 argparse.py:1435(_pop_action_class) 3 0.000 0.000 0.000 0.000 argparse.py:1439(_get_handler) 6 0.000 0.000 0.000 0.000 argparse.py:1448(_check_conflict) 1 0.000 0.000 0.000 0.000 argparse.py:147(HelpFormatter) 1 0.000 0.000 0.000 0.000 argparse.py:1484(_ArgumentGroup) 2 0.000 0.000 0.000 0.000 argparse.py:1486(init) 6 0.000 0.000 0.000 0.000 argparse.py:1508(_add_action) 1 0.000 0.000 0.000 0.000 argparse.py:1518(_MutuallyExclusiveGroup) 1 0.000 0.000 0.000 0.000 argparse.py:1538(ArgumentParser) 6 0.000 0.000 0.000 0.000 argparse.py:154(init) 1 0.000 0.000 0.001 0.001 argparse.py:1556(init) 6 0.000 0.000 0.000 0.000 argparse.py:1680(_add_action) 1 0.000 0.000 0.000 0.000 argparse.py:1692(_get_positional_actions) 1 0.000 0.000 0.000 0.000 argparse.py:1700(parse_args) 1 0.000 0.000 0.000 0.000 argparse.py:1707(parse_known_args) 1 0.000 0.000 0.000 0.000 argparse.py:1742(_parse_known_args) 2 0.000 0.000 0.000 0.000 argparse.py:1789(take_action) 2 0.000 0.000 0.000 0.000 argparse.py:1810(consume_optional) 1 0.000 0.000 0.000 0.000 argparse.py:1887(consume_positionals) 1 0.000 0.000 0.000 0.000 argparse.py:197(_Section) 6 0.000 0.000 0.000 0.000 argparse.py:199(init) 2 0.000 0.000 0.000 0.000 argparse.py:2020(_match_argument) 1 0.000 0.000 0.000 0.000 argparse.py:2039(_match_arguments_partial) 2 0.000 0.000 0.000 0.000 argparse.py:2055(_parse_optional) 2 0.000 0.000 0.000 0.000 argparse.py:2156(_get_nargs_pattern) 2 0.000 0.000 0.000 0.000 argparse.py:2200(_get_values) 6 0.000 0.000 0.000 0.000 argparse.py:2326(_get_formatter) 6 0.000 0.000 0.000 0.000 argparse.py:557(_metavar_formatter) 6 0.000 0.000 0.000 0.000 argparse.py:566(format) 6 0.000 0.000 0.000 0.000 argparse.py:573(_format_args) 1 0.000 0.000 0.000 0.000 argparse.py:62() 1 0.000 0.000 0.000 0.000 argparse.py:629(RawDescriptionHelpFormatter) 1 0.000 0.000 0.000 0.000 argparse.py:640(RawTextHelpFormatter) 1 0.000 0.000 0.000 0.000 argparse.py:651(ArgumentDefaultsHelpFormatter) 1 0.000 0.000 0.000 0.000 argparse.py:685(ArgumentError) 1 0.000 0.000 0.000 0.000 argparse.py:705(ArgumentTypeError) 1 0.000 0.000 0.000 0.000 argparse.py:714(Action) 6 0.000 0.000 0.000 0.000 argparse.py:765(init) 1 0.000 0.000 0.000 0.000 argparse.py:805(_StoreAction) 2 0.000 0.000 0.000 0.000 argparse.py:807(init) 1 0.000 0.000 0.000 0.000 argparse.py:840(_StoreConstAction) 3 0.000 0.000 0.000 0.000 argparse.py:842(init) 2 0.000 0.000 0.000 0.000 argparse.py:859(call) 1 0.000 0.000 0.000 0.000 argparse.py:863(_StoreTrueAction) 3 0.000 0.000 0.000 0.000 argparse.py:865(init) 1 0.000 0.000 0.000 0.000 argparse.py:880(_StoreFalseAction) 1 0.000 0.000 0.000 0.000 argparse.py:897(_AppendAction) 1 0.000 0.000 0.000 0.000 argparse.py:934(_AppendConstAction) 12 0.000 0.000 0.000 0.000 argparse.py:95(_callable) 1 0.000 0.000 0.000 0.000 argparse.py:960(_CountAction) 1 0.000 0.000 0.000 0.000 argparse.py:981(_HelpAction) 1 0.000 0.000 0.000 0.000 argparse.py:983(init) 1 0.000 0.000 0.000 0.000 collections.py:11() 1 0.000 0.000 0.000 0.000 collections.py:38(OrderedDict) 1 0.000 0.000 0.000 0.000 collections.py:407(Counter) 1 0.000 0.000 0.000 0.000 decoder.py:17(_floatconstants) 1 0.000 0.000 0.001 0.001 decoder.py:2() 1 0.000 0.000 0.000 0.000 decoder.py:272(JSONDecoder) 1 0.000 0.000 0.000 0.000 decoder.py:302(init) 480 0.001 0.000 0.010 0.000 decoder.py:359(decode) 480 0.009 0.000 0.009 0.000 decoder.py:370(raw_decode) 1 0.000 0.000 0.000 0.000 encoder.py:101(init) 267 0.001 0.000 0.004 0.000 encoder.py:186(encode) 1 0.000 0.000 0.000 0.000 encoder.py:2() 382 0.003 0.000 0.003 0.000 encoder.py:212(iterencode) 115 0.000 0.000 0.000 0.000 encoder.py:272(_make_iterencode) 16134 0.004 0.000 0.011 0.000 encoder.py:288(_iterencode_list) 15904 0.004 0.000 0.006 0.000 encoder.py:341(_iterencode_dict) 16134 0.003 0.000 0.014 0.000 encoder.py:417(_iterencode) 1 0.000 0.000 0.000 0.000 encoder.py:70(JSONEncoder) 1 0.000 0.000 0.000 0.000 fnmatch.py:11() 2 0.000 0.000 0.000 0.000 fnmatch.py:45(filter) 1 0.000 0.000 0.000 0.000 fnmatch.py:85(translate) 197 0.000 0.000 0.001 0.000 genericpath.py:23(exists) 85 0.000 0.000 0.000 0.000 genericpath.py:46(isdir) 6 0.000 0.000 0.000 0.000 gettext.py:132(_expand_lang) 3 0.000 0.000 0.000 0.000 gettext.py:424(find) 3 0.000 0.000 0.000 0.000 gettext.py:479(translation) 3 0.000 0.000 0.000 0.000 gettext.py:545(dgettext) 3 0.000 0.000 0.000 0.000 gettext.py:583(gettext) 1 0.000 0.000 0.000 0.000 glob.py:1() 2 0.000 0.000 0.001 0.000 glob.py:18(glob) 172 0.000 0.000 0.001 0.000 glob.py:29(iglob) 2 0.000 0.000 0.000 0.000 glob.py:71(glob1) 170 0.000 0.000 0.000 0.000 glob.py:82() 6 0.000 0.000 0.000 0.000 glob.py:99(has_magic) 1 0.000 0.000 0.000 0.000 heapq.py:31() 1 0.000 0.000 0.000 0.000 keyword.py:11() 6 0.000 0.000 0.000 0.000 locale.py:365(normalize) 1 0.025 0.025 0.118 0.118 main.py:1() 2720 0.007 0.000 0.018 0.000 main.py:134(file_lines_to_list) 450 0.000 0.000 0.000 0.000 main.py:398() 30 0.000 0.000 0.000 0.000 main.py:85(voc_ap) 2 0.000 0.000 0.000 0.000 os.py:136(makedirs) 2 0.000 0.000 0.000 0.000 os.py:209(walk) 2636 0.001 0.000 0.001 0.000 posixpath.py:112(basename) 4 0.000 0.000 0.000 0.000 posixpath.py:132(islink) 2635 0.004 0.000 0.005 0.000 posixpath.py:329(normpath) 2 0.000 0.000 0.000 0.000 posixpath.py:44(normcase) 420 0.000 0.000 0.000 0.000 posixpath.py:61(join) 4 0.000 0.000 0.000 0.000 posixpath.py:82(split) 2 0.000 0.000 0.000 0.000 re.py:138(match) 23 0.000 0.000 0.002 0.000 re.py:192(compile) 4 0.000 0.000 0.000 0.000 re.py:208(escape) 25 0.000 0.000 0.002 0.000 re.py:230(_compile) 1 0.000 0.000 0.001 0.001 scanner.py:2() 4/2 0.000 0.000 0.012 0.006 shutil.py:210(rmtree) 1 0.000 0.000 0.000 0.000 shutil.py:31(Error) 1 0.000 0.000 0.000 0.000 shutil.py:34(SpecialFileError) 1 0.000 0.000 0.000 0.000 shutil.py:38(ExecError) 1 0.001 0.001 0.001 0.001 shutil.py:5() 20 0.000 0.000 0.000 0.000 sre_compile.py:228(_compile_charset) 20 0.000 0.000 0.000 0.000 sre_compile.py:256(_optimize_charset) 6 0.000 0.000 0.000 0.000 sre_compile.py:411(_mk_bitmap) 15 0.000 0.000 0.000 0.000 sre_compile.py:428(_simple) 12 0.000 0.000 0.000 0.000 sre_compile.py:433(_compile_info) 24 0.000 0.000 0.000 0.000 sre_compile.py:546(isstring) 12 0.000 0.000 0.001 0.000 sre_compile.py:552(_code) 12 0.000 0.000 0.002 0.000 sre_compile.py:567(compile) 41/12 0.000 0.000 0.000 0.000 sre_compile.py:64(_compile) 81 0.000 0.000 0.000 0.000 sre_parse.py:137(len) 4 0.000 0.000 0.000 0.000 sre_parse.py:139(delitem) 142 0.000 0.000 0.000 0.000 sre_parse.py:141(getitem) 15 0.000 0.000 0.000 0.000 sre_parse.py:145(setitem) 50 0.000 0.000 0.000 0.000 sre_parse.py:149(append) 56/27 0.000 0.000 0.000 0.000 sre_parse.py:151(getwidth) 12 0.000 0.000 0.000 0.000 sre_parse.py:189(init) 175 0.000 0.000 0.000 0.000 sre_parse.py:193(__next) 106 0.000 0.000 0.000 0.000 sre_parse.py:206(match) 142 0.000 0.000 0.000 0.000 sre_parse.py:212(get) 18 0.000 0.000 0.000 0.000 sre_parse.py:236(_class_escape) 14 0.000 0.000 0.000 0.000 sre_parse.py:268(_escape) 20/12 0.000 0.000 0.001 0.000 sre_parse.py:317(_parse_sub) 23/13 0.000 0.000 0.001 0.000 sre_parse.py:395(_parse) 12 0.000 0.000 0.000 0.000 sre_parse.py:67(init) 12 0.000 0.000 0.001 0.000 sre_parse.py:706(parse) 7 0.000 0.000 0.000 0.000 sre_parse.py:74(opengroup) 7 0.000 0.000 0.000 0.000 sre_parse.py:85(closegroup) 41 0.000 0.000 0.000 0.000 sre_parse.py:92(init) 230 0.000 0.000 0.000 0.000 stat.py:24(S_IFMT) 226 0.000 0.000 0.000 0.000 stat.py:40(S_ISDIR) 4 0.000 0.000 0.000 0.000 stat.py:55(S_ISLNK) 6130 0.001 0.000 0.001 0.000 {_json.encode_basestring_ascii} 12 0.000 0.000 0.000 0.000 {_sre.compile} 2 0.000 0.000 0.000 0.000 {_struct.unpack} 32 0.000 0.000 0.000 0.000 {chr} 2 0.000 0.000 0.000 0.000 {filter} 3 0.000 0.000 0.000 0.000 {getattr} 32 0.000 0.000 0.000 0.000 {hasattr} 1249 0.000 0.000 0.000 0.000 {id} 16319 0.003 0.000 0.003 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {iter} 1275/1253 0.000 0.000 0.000 0.000 {len} 2147 0.000 0.000 0.000 0.000 {max} 4 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 7572 0.001 0.000 0.001 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 960 0.000 0.000 0.000 0.000 {method 'end' of '_sre.SRE_Match' objects} 468 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects} 9 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects} 62 0.000 0.000 0.000 0.000 {method 'find' of 'bytearray' objects} 18 0.000 0.000 0.000 0.000 {method 'find' of 'str' objects} 63 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects} 72 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'group' of '_sre.SRE_Match' objects} 60 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects} 12 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 1136 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects} 2913 0.001 0.000 0.001 0.000 {method 'join' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects} 6 0.000 0.000 0.000 0.000 {method 'lstrip' of 'str' objects} 1143 0.001 0.000 0.001 0.000 {method 'match' of '_sre.SRE_Pattern' objects} 12 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects} 480 0.002 0.000 0.002 0.000 {method 'read' of 'file' objects} 2720 0.005 0.000 0.005 0.000 {method 'readlines' of 'file' objects} 9 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects} 10 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects} 6 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects} 2640 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects} 6 0.000 0.000 0.000 0.000 {method 'search' of '_sre.SRE_Pattern' objects} 72 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects} 32 0.000 0.000 0.000 0.000 {method 'sort' of 'list' objects} 21273 0.004 0.000 0.004 0.000 {method 'split' of 'str' objects} 1277 0.000 0.000 0.000 0.000 {method 'split' of 'unicode' objects} 3103 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects} 16000 0.001 0.000 0.001 0.000 {method 'strip' of 'str' objects} 12 0.000 0.000 0.000 0.000 {method 'translate' of 'str' objects} 16387 0.002 0.000 0.002 0.000 {method 'write' of 'file' objects} 1750 0.000 0.000 0.000 0.000 {min} 3585 0.012 0.000 0.012 0.000 {open} 25 0.000 0.000 0.000 0.000 {ord} 7 0.000 0.000 0.000 0.000 {posix.listdir} 145 0.000 0.000 0.000 0.000 {posix.lstat} 2 0.000 0.000 0.000 0.000 {posix.mkdir} 139 0.011 0.000 0.011 0.000 {posix.remove} 4 0.000 0.000 0.000 0.000 {posix.rmdir} 282 0.000 0.000 0.000 0.000 {posix.stat} 85 0.000 0.000 0.000 0.000 {range} 7 0.000 0.000 0.000 0.000 {setattr} 3 0.000 0.000 0.000 0.000 {sorted} 1 0.000 0.000 0.000 0.000 {zip}

Cartucho commented 6 years ago

@titikid Thank you, I believe what's making it slow is writing up temporary files for computation, I believe if I load things into memory instead of files it should be faster.

Stinky-Tofu commented 5 years ago

When testing mAP with the VOC2007 data set, should the bbox marked as "difficult" be removed?

Cartucho commented 5 years ago

Oh right... the script that is converting the xml to our format is not currently checking whether the detections are tagged as difficult or not. I can add that.

Did you use this script?

Stinky-Tofu commented 5 years ago

Thanks, I didn't notice your script, and wrote one myself. After removing the bbox marked as "difficult", the mAP on voc2007 was upgraded from 81.5 to 85. https://github.com/Stinky-Tofu/YOLO_V3

kocica commented 5 years ago

I have also recently seen comparison of AP on different object sizes (AP_small, AP_medium, AP_large). I think some people might find it useful and it would not be hard to implement in my opinion. E.g. on page #3 in YOLOv3 paper: https://pjreddie.com/media/files/papers/YOLOv3.pdf.

Cartucho commented 5 years ago

Hello @kocica that sounds like a great idea! However how do you define which objects are small, medium and large in the image? Is there any standard rule?

We could also cluster the objects given their area. And we would know that we want to obtain 3 clusters.

ndcuong91 commented 5 years ago

Hi @Cartucho , i just checked the lastest code, it's good, even for the speed. I added function to script "convert_gt_xml.py" to convert VOC's ground truth files to your format with "difficult" label. I will make a PR for that. Another request, can you update the calculation for VOC2007 metric? ( your code use VOC2012 metric and sometimes it's not enough for benchmark)

Cartucho / mAP

VOC 2007 metric? #24