Open egrace479 opened 3 months ago
The request HTTPAdapter
with the urllib3 Retry strategy looks good for some of the retry needs. The streaming interruption will still need to be handled separately though.
Sometimes when downloading files we end up reaching a threshold where our IP address gets blocked for a while by a remote server. In that case you typically have to wait for a few hours. I wouldn't expect or want the command to wait in this scenario. For that scenario can we re-run the cautious-robot
command and have it skip already downloaded images?
Sometimes when downloading files we end up reaching a threshold where our IP address gets blocked for a while by a remote server. In that case you typically have to wait for a few hours. I wouldn't expect or want the command to wait in this scenario. For that scenario can we re-run the
cautious-robot
command and have it skip already downloaded images?
Right now I believe it relies on adjusting the start index to avoid re-downloading the image. However, I could add a line here checking for the image:
if os.path.exists(image_dir_path/image_name):
continue
_Originally posted by @hlapp in https://github.com/Imageomics/cautious-robot/pull/1#discussion_r1626575092_
Seems reasonable to use
HTTPAdapter
, since it's already usingrequests
. Must also consider streaming interruption, as noted here.