Open david-macmahon opened 4 years ago
This bug is not as easy to track down as the others, since the particular timeout error never occurs when testing on the CAM development system. As mentioned above, so far it has only occurred (intermittently) during live observations.
In the earlier katportal_server
version, when this error occurred, the current observation would be lost (but the katportal_server
would restart, allowing subsequent observations to continue).
I have tracked the problem down to the schedule_blocks
sensor, and have made two changes (see 2205214) to try to handle this particular timeout more gracefully.
Firstly, I have manually specified a timeout duration for run_sync
which will hopefully be sufficient. This raises the question: should a timeout duration be specified for all run_sync
calls? The error has not been observed for any of the other "once-off" sensors so far.
Secondly, I have used a try
block which will facilitate debugging during the next testing session and at least permit the current observation to continue without intervention (minus the schedule-block information).
I hope to test these improvements during the next testing session (likely 2020-05-28) as I have been unable to replicate the error with the development system.
Following the testing session, it appears that explicitly extending the timeout duration has prevented this error occurring for the schedule_blocks
sensor.
However, we observed the error occurring again for a different run_sync
call; therefore it seems likely that these measures will be needed for every run_sync
call.
Occasionally, the KATPortalClient's connection to the KATPortal server times out. When this happens, manual intervention is required to get the system back into an operational state. The reason for these timeouts is not understood and may be outside our code base, but regardless of the underlying cause, KATPortalClient should handle this situation more gracefully so that the backend remains in an operational state (to whatever extent that's possible).