Apollon77 / hap-controller-node

Node.js library to implement a HAP (HomeKit) controller
Mozilla Public License 2.0
55 stars 15 forks source link

Catch and forward network errors #193

Closed thkl closed 1 year ago

thkl commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I am running a small service on a virtual machine using hap-controller-node. I've discovered some crashes which are likely uncatched network errors:

Error: connect EHOSTUNREACH 192.168.178.231:5001 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) { errno: -113, code: 'EHOSTUNREACH', syscall: 'connect', address: '192.168.178.231', port: 5001 }

the ip above is assigned to one of my HomeKit Devices (Netatmo)

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

the alternative solution is a process.on('uncaughtException', (err) ... which is not that elegant

Apollon77 commented 1 year ago

Do you have example code? Please also provide a debug log. Normally there should be NO uncatched/uncatchable errors from my knowledge, so what exactly you refer to? Maybe it is just logged? Because uncatched means that application crashes

thkl commented 1 year ago

I am basically using your sample code to subscribe to some characteristics and forward the events to an mqtt client.

I do have an error log from journalctl:

Jun 19 03:02:00 application node[4079262]: node:internal/process/promises:279
Jun 19 03:02:00 application node[4079262]:             triggerUncaughtException(err, true /* fromPromise */);
Jun 19 03:02:00 application node[4079262]:             ^
Jun 19 03:02:00 application node[4079262]: Error: connect EHOSTUNREACH 192.168.178.231:5001
Jun 19 03:02:00 application node[4079262]:     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
Jun 19 03:02:00 application node[4079262]:   errno: -113,
Jun 19 03:02:00 application node[4079262]:   code: 'EHOSTUNREACH',
Jun 19 03:02:00 application node[4079262]:   syscall: 'connect',
Jun 19 03:02:00 application node[4079262]:   address: '192.168.178.231',
Jun 19 03:02:00 application node[4079262]:   port: 5001
Jun 19 03:02:00 application node[4079262]: }
Jun 19 03:02:00 application systemd[1]: hkmqtt.service: Main process exited, code=exited, status=1/FAILURE
Jun 19 03:02:00 application systemd[1]: hkmqtt.service: Failed with result 'exit-code'.
Jun 19 03:02:00 application systemd[1]: hkmqtt.service: Consumed 1min 9.326s CPU time.

the applications is crashing ... so its not just logged

for me it looks like the HTTPClient ist not able to connect (or reconnect) to the Device. As far as I can tell, the device has not changed its internal ip (depending on router logs)

thkl commented 1 year ago

Ok I think, can reproduce this issue:

I am connecting to the device using its address and port, reported from the discovery and saved credentials:

ipClient = new HttpClient(acc.id, acc.address, acc.port, pers.pairingData, {
            usePersistentConnections: true,
});

then I do subscribe to some characteristics

const csub =['1.8','1.11'.....];
ipClient.subscribeCharacteristics(csub).then((e) => {
                    console.log(`${acc.name}: Subscribed to ${csub.length} characteristics`);
});

To enforce a network error I've unplugged the device and shortly after:

Disconnected: [
  "1.8",
  "1.11",
  "1.12",
  "1.15",
  "1.18",
  "1.21"
]
Error: connect EHOSTUNREACH 192.168.178.231:5001
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1187:16) {
  errno: -113,
  code: 'EHOSTUNREACH',
  syscall: 'connect',
  address: '192.168.178.231',
  port: 5001
} 

I am using your sample to reconnect to the device:

ipClient.on('event-disconnect', async (formerSubscribes) => {
            console.log(`Disconnected: ${JSON.stringify(formerSubscribes, null, 2)}`);
            // resubscribe if wanted:
            await ipClient.subscribeCharacteristics(formerSubscribes);
});

It looks like reconnecting is causing the crash ...

Apollon77 commented 1 year ago

Yes in fact the try to resubscribe again is then crashing ... so add a try/catch around and you are done?

In fact the library is not automatically reconnecting anything, but each call can throw an exception if communication is not possible. This means that you need to cath the error yourself.

In fact this was a design decision also from before I took it over. We have events for subscriptions especially to have no "uncatchable errors" here. All other places is "a call is doijg communication that can break", so it should be error handled on that level. There is no "automatic receonnection".

So yes I agree that the example shpuld be enhanced to add a try/catch there :-) I would be happy about a PR. Thank you for bringing this up.

thkl commented 1 year ago

So yes I've managed to make this case work by wrapping the reconnect into a try catch. Due to the fact that the discovery is still running and will raise an event if the device is available again i can connect and subscribe again ...

I will close this. thank you for your help