Tencent / tencent-ml-images

Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet
Other
3.06k stars 515 forks source link

Download shell get more invalid urls #33

Open AmberCheng opened 5 years ago

AmberCheng commented 5 years ago

Hi,

I am downloading the train datasets thses days. As for its a big data, I divided all urls into 34 parts. So every part may contains 20w images. Then I used your shell to download every part. But a strange thing happened, the number of invalid urls add the number of images is more than 20w. I checked it in one part, the invalid urls contain some image is downloaded successfully. I wonder have you met this situation?

wubaoyuan commented 5 years ago

@AmberCheng I guess, when the url is valid, it also saves an image showing "not available".

AmberCheng commented 5 years ago

@wubaoyuan I have just check it. The image is actual a image, not "not avaliable",but it is a broken one. I wonder why you don't package them, for downloading them has much more trouble......

wubaoyuan commented 5 years ago

@AmberCheng Personally I would like to share the images. But, there is copyright risk for our company.

wubaoyuan commented 5 years ago

@AmberCheng Please follow the suggestion that, downloading all images of ImageNet, then using the list we provide to extract the images used in our ML-Images. The URLs from Open Images are valid.