Uninett / zinolib

Python library for zino
Apache License 2.0
1 stars 3 forks source link

Fix clear_flapping method in Zino1EventManager #48

Closed podliashanyk closed 6 months ago

podliashanyk commented 6 months ago

[BUG]: Typo in if-check if event.type == Event.Type.PortState:. PortState type doesnt exist, should be PORTSTATE.

Other desirable changes (nice-to-haves):

hmpf commented 6 months ago
* clear_flapping should accept `event_id` instead of `event` (just like other event modification methods in Zino1EventManager)

clear_flapping takes an event because not any id will do though so the typing would be wrong (and cannot be made correct for some definitions of correct.)

hmpf commented 6 months ago
* clear_flapping should return an updated event instead of `bool` (just like other event modification methods in Zino1EventManager)

I'm guessing clear_flapping returns a bool in the original because it is a signal sent to the router in question (please attempt to clear the flapping!), and the flapping may or may not actually have been cleared. The function returns with True iff the server accepted the signal, not when the event itself has changed.

@lunkwill42, in the old and new servers, how much time would it take between clear_flapping is sent until we know whether it works?

podliashanyk commented 6 months ago
* clear_flapping should accept `event_id` instead of `event` (just like other event modification methods in Zino1EventManager)

clear_flapping takes an event because not any id will do though so the typing would be wrong (and cannot be made correct for some definitions of correct.)

I was thinking of making clear_flapping more alike other methods in Zino1EventManager. Almost all other event read or write methods in Zino1EventManager accept event_id now. Event though many of them have originally accepted event. Like:

def change_admin_state_for_id(self, event_id, admin_state: AdmState) -> Optional[Event]:
...
def add_history_entry_for_id(self, event_id: int, message) -> Optional[Event]:
...
podliashanyk commented 6 months ago
* clear_flapping should return an updated event instead of `bool` (just like other event modification methods in Zino1EventManager)

I'm guessing clear_flapping returns a bool in the original because it is a signal sent to the router in question (please attempt to clear the flapping!), and the flapping may or may not actually have been cleared. The function returns with True iff the server accepted the signal, not when the event itself has changed.

If clear flapping operation is successful, there is at least 1 event attribute that will change, if I understand it correctly. FLAPSTATE: flapping to FLAPSTATE: stable. If so, could we assume that any client (not just Howitz) would need to fetch an updated event after successful clear flapping? And if so, should clear_flapping() in Zino1EventManager therefore return Optional[Event] like other event update methods there?

But if event update by the server after a successful clear flapping is potentially reasonably delayed then I see why the method should simply return bool. 🤔

hmpf commented 6 months ago

clear_flapping ought to have been directly on the PortStateEvent and not on the manager, just like get_downtime. The reason it is not is because clear_flapping needs to talk to the server while get_downtime does not. For ease of testing, anything that do need to talk to the server is in the Manager.

I haven't managed to think of an elegant way to avoid the import loop.

podliashanyk commented 6 months ago

clear_flapping ought to have been directly on the PortStateEvent and not on the manager, just like get_downtime. The reason it is not is because clear_flapping needs to talk to the server while get_downtime does not. For ease of testing, anything that do need to talk to the server is in the Manager.

I haven't managed to think of an elegant way to avoid the import loop.

Oh I see then, that explains it. Then a bugfix with typo will suffice 🙂

lunkwill42 commented 6 months ago

@lunkwill42, in the old and new servers, how much time would it take between clear_flapping is sent until we know whether it works?

In the new server, CLEARFLAP isn't implemented yet.

In the old server, CLEARFLAP doesn't do anything the client needs to specifically wait for the result of. It does what the name says: It clears the internal flapping status of an event/port, meaning that Zino's flapping detection algorithm is reset.

Incidentally, CLEARFLAP also triggers an SNMP poll operation against the router interface in question, likely in order to ensure Zino has the latest actual up/down state of that interface. It's not clear to me that CLEARFLAP actually waits for the result of this poll operation before returning a 200 ok to the client. It appears to only initiate the poll operation and return a 200 ok immediately. The actual response handler runs asynchronously, and the actual SNMP response is not fed back into any API response.