Closed podliashanyk closed 3 weeks ago
The code in zinolib.ritz.notifier.poll
either:
None
is returned if an OSError
with errno.EAGAIN
or errno.EWOULDBLOCK
was raised while there was no newline in notifier._buff
. EAGAIN/EWOULDBLOCK on linux means "temporary failure, please feel free to try again".
This code never raises RetryError
.
The code in zinolib.controllers.zino1.UpdateHandler.get_event_update
calls zinolib.ritz.notifier.poll
.
It does not attempt to catch any exceptions.
If .poll
returns None
(after OSError errno.EAGAIN or errno.EWOULDBLOCK) it returns False
, otherwise it handles the update.
The actual handler either returns the id of an updated event or None
if it's a new, unfinished event or False
if it was an unknown/unhandled update type.
Howitz's howitz.endpoints.update_events
runs zinolib.controllers.zino1.UpdateHandler.get_event_update
in a while
-loop.
It catches RetryError
, which will never be raised.
If the event updater returns None
, it breaks the while loop. This is what should trigger a new attempt and is the reason I suspect for the sudden loss of updates.
We should try:
if updated:
updated_ids.add(updated)
instead of
if not updated:
break
updated_ids.add(updated)
and stop catching RetryError
.
Some polish might be to have a counter for how many times we have seen updated == None
and raise a more loud error/check the connection if it is above the magic number. On every successful update the counter is set to zero.
Also, might be smart to have time.sleep(1)
if updated == None
.
I'll make a patch.
Curitz aborts because it attempts to receive from the API socket and the API socket times out. It does not fail because the NTIE socket dies.
This is fixed in zinolib 1.2.0, that is: there is a new method zinolib.controllers.zino1.Zino1EventManager.test_connection()
that will raise a TimeoutError if the server is not reachable. Howitz will have to run the method and do something with the timeout as per #115.
Curitz checks the connection every 60 seconds, this is hard coded.
As per previous comment, closing this. When to run the test-method belongs in a separate issue.
See also Uninett/zinolib#63
Seems like connection loss to Zino server is not "picked up" by Howitz when requesting for NTIE updates. Further investigation is needed.
To reproduce:
Curitz will eventually quit with the error message "Connection lost with the zino server...". Howitz does nothing.