agsh / onvif

ONVIF node.js implementation
http://agsh.github.io/onvif/
MIT License
696 stars 237 forks source link

Restart event loop for pull messages request when cam drops connection #314

Closed akomelj closed 4 months ago

akomelj commented 7 months ago

TP-Link Tapo C220 (firmware 1.1.6) supports ONVIF pull-point subscriptions but unfortunately sets the socket read timeout on the pull endpoint to 10 seconds - i.e. every pull messages request gets closed by the cam after 10 seconds of I/O inactivity.

I tried changing various timeouts on the network & messages level and using TCP keep-alives to no avail - the cam sends TCP FIN packet roughly 10 seconds after receiving HTTP request headers and the connection gets closed prematurely with ECONNRESET error.

This pull request intercepts ECONNRESET error before it is propagated to event callbacks and restarts the event request loop (which handles subscription expiration, renewal, etc).

I believe the original plan of the author was to handle such cases one level higher - in _eventPull() - by unsubscribing and resubscribing again, but due to a bug in the code (?) unsubscribe() callback is never called and subscription never gets reinitialized.

This code:

    this.unsubscribe({}, function(_err,_data,_xml) {
        // once the unsubsribe has completed (even if it failed), go around the loop again
        this._eventRequest();
    });

should probably be without the empty object {} passed as the first parameter of unsubscribe call (the callback parameter).

    this.unsubscribe(function(_err,_data,_xml) {
        // once the unsubsribe has completed (even if it failed), go around the loop again
        this._eventRequest();
    });

Edited to add: fix for the above has already been submitted in #310.

In any case, as the subscription is still active at this point , there is no need to go through the unsubscribe/subscribe cycle again until the subscription expires and this is handled automatically by _eventRequest(). Unsubscribe doesn't work in the case of Tapo C220 anyway - the cam returns 500 Internal Server Error. :-)

ECONNRESET handling could be moved to _eventPull(), though. Please advise if any changes are needed.

RogerHardiman commented 7 months ago

Hi Many thanks for this. Andrew was the original author (I think) and then I re-wrote some of the event subscription code about 18 months ago. So when I get a chance I'll take a look. I also have a Tapo C200 and I know someone with the C210.

Many thanks for your contribution. I'll try and take a look later this week.

Roger

agsh commented 4 months ago

@akomelj Hello! Thank you very much for pointing out the problem and suggesting a solution! Sorry for the very late answer image Yes, this behavior will help on the connection reset. Event pulling stops when any error occurs and it seems that we need to continue this infinitive loop in the connection trouble case. And yes, we need to move reconnection code to _eventPull because pullMessages is the public function from the ONVIF specs and can be used somewhere else in the user code. I'll write the tests for this point and move error catching to _eventPull BTW I ordered TP-Link Tapo C220 to myself and test it at the next weekend :smile: