Closed johnarevalo closed 3 years ago
Thanks @johnarevalo! Is this ready for review. If so, I think @bethac07 would be it
Awesome, thanks for your help with this!
Just an FMI, if you ran it with 8 why do you think 4 is the max (and why is 8 the default in the script?), was it just running into a lot of Slow Down
errors?
My concern with having output written only to a log file is that people might never actually check it, especially if the request is thousands or 10s of thousands of files. Can we figure out a way to summarize or print some sort of console report at the end?
I have experienced limitations for parallel requests on other AWS APIs before. After certain number (e.g. 4 or 8 concurrent calls) there is not time savings.
I ran couple more restorations with 4 and 6 workers for the same number of objects. It took ~35min and ~26min respectively. My guess is this: the limit is 4 concurrent calls, and the script requires an additional thread to coordinate. So setting it to 8 is a conservative default IMO.
About summarizing, we could print counts per status, something like:
REQUESTED 17258
IN_PROGRESS 19
ERROR 1
RESTORED 0
For more info check path/to/log/output.csv
Yeah, I've run into other throttling stuff as well, which is why I initially hadn't dug too deeply into parallelization, because I wanted to make sure to stay under the throttling limit.
I think that's a great way of summarizing!
have experienced limitations for parallel requests on other AWS APIs before. After certain number (e.g. 4 or 8 concurrent calls) there is not time savings.
That's good to know!
@bethac07, last push includes the suggested changes.
I ran this version to restore 17k files with 1 and with 8 workers. 1 worker took ~2h while 8 workers took ~26min.
Probably 4 workers is the limit for this parallelization strategy.