Open jpswinski opened 1 year ago
Here is an error from the logs showing a 500 that subsequently was successful:
2023-09-27 11:49:35 | ip=10.0.172.45 level=critical caller=S3CurlIODriver.cpp:459 msg="S3 get returned http error <500>: data/GEDI/GEDI01_B_2019109210809_O01988_03_T02056_02_005_01_V002.h5"
Every once in a while an S3 GET request will fail with an error code, and subsequent requests to the same object will succeed.
Currently the code handles timeouts and partial responses and will retry the request, but if S3 returns an HTTP error code, it will fail outright and not try again.
Consider looking at the error code and doing different things. For instance, failing outright on a 404 would be fine, but maybe a 500 merits a retry.
https://github.com/ICESat2-SlideRule/sliderule/blob/38764aaf2c948eccd14b40094eadad520d430961/packages/aws/S3CurlIODriver.cpp#L444-L464
The
info.index
will likely need to be set back to 0 on a failure. I'm not sure if the headers would need to be reset and what the effect is of reinitializing the curl structure.