Tencent / tencent-ml-images

Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
Other
3.06k stars 515 forks source link

Image cannot be downloaded correctly #8

Closed qianxuyidian-2018 closed 6 years ago

qianxuyidian-2018 commented 6 years ago

when i ran the script to download image from URL that listed in train_url_tiny.txt, i got the bad image. im_0 when i copied the URL to web browser,i can see the correct image。 Do you know the cause of the problem?

gongwk commented 6 years ago

you should add “url = url.split(' ')[0]” after “url = sp[0]”. I have printed the url and find that the output are not only the image's url.

gongwk commented 6 years ago

maybe in linux space means ‘\t’, but in windows that just means a space

wubaoyuan commented 6 years ago

@gongwk @qianxuyidian-2018 Thanks for your feedbacks. The demo wroks in my server. I will check it again.

Actually, we have realized the invalid URLs and the downloading speed are the main barrier to utilize our data. We are trying to find an approach to access the data more easily.

wubaoyuan commented 6 years ago

@gongwk @qianxuyidian-2018 We have provided a new file 'download_urls_multithreading.sh', the downloading speed is much faster. Please refer to README for more details.

qianxuyidian-2018 commented 6 years ago

@gongwk you are right,I ran the script in windows.It is ok when i changed "\t" to " ".thank you! @wubaoyuan thanks,i will try the new script.