Closed mattjala closed 6 months ago
This issue isn't due to scans taking a long time. The domain scans are actually getting stuck in an infinite wait due to inaccurate timestamps after #346
Sometimes, a scan would record a completion timestamp that was slightly BEFORE the recorded time that the rescan request was sent out. Because the check to stop waiting for the scan requires a scan finished timestamp later than the scan request time, it would never terminate and eventually return a 503. The inaccuracy occurs because the node that records the scan completion time is a different node than records the request time.
I'm not sure why getNow()
is more inconsistent between nodes than time.time()
- even when nodes start at different times, time.perf_counter() - app["start_time_relative"]
should be a precise measure of how long the node has been online, and app["start_time]"
should be an OS-precision UNIX timestamp. Adding them should produce unix timestamp for the for the current time which is no more inaccurate than time.time()
. It shouldn't be a problem with async operations, since perf_counter continues to count during sleep and is system wide.
The value is currently hardcoded. Letting it be set would allow the runners to wait longer and avoid failures like this.