Open LCorleone opened 6 years ago
I think you can use proxy servers to accelerate your access to these images on Amazon.
@LCorleone You can use multi-threading or proxy servers in python to speed up downloading images. The code is available.
@BraveApple Thanks,nice work!
i dont know why i must use proxy servers to download picture, it is maddening when requests post break
i dont know why i must use proxy servers to download picture, it is maddening when requests post break
i give up
Actually, the best way to download such datasets is to use cloud server. I used to use AWS to do this. However, there is still a problem waiting for us. It's very slow to upload datasets to our computer in China. Even using bypy to do this, it still sucks!
I wrote a script yesterday to download the dataset on AWS. After 12 hours, 600k images have been downloaded. (About 20% of the image links no long exist.) Even I have croped the face from the raw image, the dataset is still very huge. I think it would have a size of 55G when the whole dataset was downloaded.
Finally, I finished. It's about 50G, with about 17% links expired.
@wangx404 could you share the cropped data?
Could someone kindly upload the downloaded data to BaiduYun ?
why IMDb-Face.csv only has 1048576 images?have u downloaded all dataset?
@wangx404 Could you share your download data to BaiduYun? many thx.
Great job! I use python urllib. Maybe I am in China, the url for downloading is too slow. Is there any way to deal with it? or is there anyone to share the dataset? Thanks.