Tencent / tencent-ml-images

Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
Other
3.05k stars 514 forks source link

[bug] duplicate image names can cause overwrite problem #32

Open linrongc opened 5 years ago

linrongc commented 5 years ago

In the download script, saving images with path "save_dir + im_name" will overwrite any images with same name.

For example: http://i.ytimg.com/vi/6rMwgpPSJyU/3.jpg 8486:1 8479:1 8473:1 5175:1 5170:1 1042:1 865:1 2:1

http://web.mit.edu/admissions/blogs/photos/jenny-whitesox/3.jpg 10591:1 1914:1 1897:1 1829:1 1054:1 1041:1 865:1 2:1

http://bp2.blogger.com/_u3lFqBksmrE/Rgoqe1STw-I/AAAAAAAACKI/sl1nY4Q4RAc/s400/3.jpg 9199:1 9170:1 8585:1 5177:1 5170:1 1042:1 865:1 2:1 ....

they have the same image name.

wubaoyuan commented 5 years ago

@linrongc Thanks for this feedback. We will try to fix it asap.

linrongc commented 5 years ago

using line_num as save name with mapping file will be an easy fix.

wwfnwg commented 5 years ago

这是一个巨坑

wubaoyuan commented 5 years ago

@linrongc good suggestion.

wubaoyuan commented 5 years ago

@linrongc @wwfnwg I have updated Line 27,28 of 'download_urls_multithreading.py'. Sorry for this bug, if any other bug, please let me know. Thanks.

wwfnwg commented 5 years ago

@linrongc @wwfnwg I have updated Line 27,28 of 'download_urls_multithreading.py'. Sorry for this bug, if any other bug, please let me know. Thanks.

还是有问题,有些整个url都是相同的,标签却不相同

wubaoyuan commented 5 years ago

@linrongc @wwfnwg I have updated Line 27,28 of 'download_urls_multithreading.py'. Sorry for this bug, if any other bug, please let me know. Thanks.

还是有问题,有些整个url都是相同的,标签却不相同

@wwfnwg Please see the updated README about the repeated URLs.