RETRY_AFTER_MS_HEADER not getting properly "honored"

Considering following code,

In the context of this code, if the RETRY_AFTER_MS_HEADER is found in the response headers, the function will sleep for that much time but then the function will resume from the point where asyncio.sleep was called, not from the start of the function or the start of the while loop.

Thus, in cases when the RETRY_AFTER_MS_HEADER duration would be greater than MAX_RETRY_SECONDS, which is very much possible, considering how much less the value of MAX_RETRY_SECONDS (5s) currently is, the condition of the while loop will be false after waking from the thread sleep as the while loop is considering the stale time value capture from the 1st incident of throttling, hence the while loop will not continue, effectively meaning the request will not be retried.

Suggestions:

The time check logic in while loop should not be based on the stale value and should update considering the time difference of thread sleep from RETRY_AFTER_MS_HEADER sleep condition.
Value for MAX_RETRY_SECONDS should not be hardcoded and be made configurable and decided by user. See #36
Documenting about MAX_RETRY_SECONDS and letting the users know how it works. See #37

Azure / azure-openai-benchmark

RETRY_AFTER_MS_HEADER not getting properly "honored" #33