Open tbaur opened 1 year ago
More of the similar crashing.
[2/23/2023, 7:27:27 AM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXX failed with status 400. Response body: envproxy unknown host
[2/23/2023, 7:27:27 AM] HTTPError: Response code 400 (Bad Request)
at Request.
And then after repeat issues, repeat crashing and homebridge restarts, the API then rejects and causes even further spinning.
[2/23/2023, 7:27:40 AM] [Ring] HTTPError: Response code 429 (Too Many Requests)
@tbaur sorry I didn't see this issue sooner. The 429 is a rate-limiting response, likely because you restarted many times which caused a large number of requests. Are you still experiencing the 502 and 400 errors which originally caused you trouble?
Yeah, I get the 429. Handling of the 502/400's seems to crash homebridge though, repeatedly, and with all the automatic service restarts it then hits the 429 and it doesn't recover well. Unclear why the 502/400's would crash the whole service. Perhaps some improvements in handling upstream API failures may help.
Thanks for the update. In your second comment, I see envproxy unknown host
as the error. Do you have some sort of Proxy or VPN set up? I haven't seen errors like this before, so I'm guessing it's something related to your network
There is no proxy or vpn inline of homebridge on my side -- envproxy unknown host is in the response body. I'd guess that app.ring.com is a front end API gateway for backend services. My original point for raising this issue wasn't that app.ring.com had a problem -- it was the handling of the API call errors -- they shouldn't crash homebridge entirely. The 502 server side error triggers the crash, then restart crash repeat loops, and things go downhill with various 4xx codes.
Here is another from this morning. Server side error and the plugin crashed homebridge.
[5/15/2023, 9:34:56 AM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXXX failed with status 503. Response body: {"errors":[4007]}
[5/15/2023, 9:34:56 AM] HTTPError: Response code 503 (Service Unavailable)
at Request.
@tbaur just to make sure I understand your intent with this issue - is the ring plugin generally working for you, but occasionally crashing or crash-looping if an unexpected server error is encountered?
[5/26/2023, 3:16:50 PM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXXX failed with status 503. Response body: {"errors":[4007]}
[5/26/2023, 3:16:50 PM] HTTPError: Response code 503 (Service Unavailable)
at Request.
==> restart and crash again:
[5/26/2023, 3:27:19 PM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXXX failed with status 500. Response body:
[5/26/2023, 3:27:19 PM] HTTPError: Response code 500 (Server Error)
at Request.
Yes, plugin works, until server side errors on ring's API crash homebridge.
So this has been out here a long time and it doesn't seem to happen often enough to be critical, but I had a little time tonight and did some digging and I think this is simply because the request()
function in rest-client.ts looks for specific status codes and retries them, but for any return codes that are not explicitly handled (401, 404, 504) it throws the error. However, as far as I can see, none of the callers of request()
actually attempt to catch an error, so the unhandled promise rejection leads to the node process exiting since that is the default for modern node versions since v16 (maybe v15 during development). Prior to that, this probably was less of an issue.
I don't see this in ring-mqtt because I use process.on('unhandledRejection')
to catch such events for a few other "normal" cases such as invalid token and make the error messages slight cleaner by not outputting the stack trace because otherwise users open cases about them, but it also has the advantage of catching these crashing cases. However, I don't think this approach is appropriate for a library.
I believe the proper fix is to implement error handling via a catch for every call to request, since it is certainly possible that any given request my actually fail. However, in ring-client-api alone there are around 70 callers to request()
and certainly some library users will call request()
directly for some things as well (I have at least a half-dozen such calls in ring-mqtt)
I don't have time to try to fix all of these right now so that leaves a few alternatives. The working assumption of most of the code has always been that a call to request()
will retry automatically until it succeeds so my lean is to handle all 500 level responses the same as a gateway timeout.
This still leaves 400 errors, in general I don't think "Bad requests" errors should be retried, so I'm not sure what to do with them. Maybe just log but don't throw the error? I don't think that behavior would be worse than what the code did prior in Node v14 and earlier when it generated promise rejections but the node process didn't exit.
Fully open to other suggestions, is there some cleaner option I am missing?
Yeah, it’s your call on how or if to solve, I respect that. I don’t feel that upstream errors should crash an entire service as a general rule and if/when it happens repeatedly here, you also start hitting upstream API lockouts on other services (via other plugins). I addressed it by moving the Ring plugin to a child bridge, so it can now crash all it wants and not impact the broader service. Thanks for looking into it more!
On Nov 4, 2023, at 6:48 PM, tsightler @.***> wrote:
So this has been out here a long time and it doesn't seem to happen often enough to be critical, but I had a little time tonight and did some digging and I think this is simply because the request() function in rest-client.ts looks for specific status codes and retries them, but for any return codes that are not explicitly handled (401, 404, 504) it throws the error. However, as far as I can see, none of the callers of request() actually attempt to catch an error, so the unhandled promise rejection leads to the node process exiting since that is the default for modern node versions since v16 (maybe v15 during development). Prior to that, this probably was less of an issue. I don't see this in ring-mqtt because I use process.on('unhandledRejection') to catch such events for a few other "normal" cases such as invalid token and make the error messages slight cleaner by not outputting the stack trace because otherwise users open cases about them, but it also has the advantage of catching these crashing cases. However, I don't think this approach is appropriate for a library. I believe the proper fix is to implement error handling via a catch for every call to request, since it is certainly possible that any given request my actually fail. However, in ring-client-api alone there are around 70 callers to request() and certainly some library users will call request() directly for some things as well (I have at least a half-dozen such calls in ring-mqtt) I don't have time to try to fix all of these right now so that leaves a few alternatives. The working assumption of most of the code has always been that a call to request() will retry automatically until it succeeds so my lean is to handle all 500 level responses the same as a gateway timeout. This still leaves 400 errors, in general I don't think "Bad requests" errors should be retried, so I'm not sure what to do with them. Maybe just log but don't throw the error? I don't think that behavior would be worse than what the code did prior in Node v14 and earlier when it generated promise rejections but the node process didn't exit. Fully open to other suggestions, is there some cleaner option I am missing? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
To be fair, it's more because Node changed behavior that it crashes now, since they made unhandled rejections exit by default sometime after NodeJS 14. You can always run node with --unhandled-rejections=warn
which was the behavior when the code was originally written many years ago.
Not that I'm disagreeing with you, but I just don't have time to go hunt down every consumer and put in proper error handling. I'm currently testing out just not throwing an error, but just logging it.
Makes sense! Thanks.
On Nov 5, 2023, at 11:59 AM, tsightler @.***> wrote:
To be fair, it's more because Node changed behavior that it crashes now, since they made unhandled rejections exit by default sometime after NodeJS 14. You can always run node with --unhandled-rejections=warn which was the behavior when the code was originally written many years ago. Not that I'm disagreeing with you, but I just don't have time to go hunt down every consumer and put in proper error handling. I'm currently testing out just not throwing an error, but just logging it. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Bug Report
Describe the Bug
API call hit a 502, and it crashed homebridge entirely. Logs below. This happened 3-4 times last night. Clearly the Ring API was having issues, but it shouldn't crash homebridge all together.
Screenshots/Logs
[2/17/2023, 4:54:44 AM] [Ring] Request to https://app.ring.com/api/v1/mode/location/0950258e-XXX failed with status 502. Response body:
502 Bad Gateway
[2/17/2023, 4:54:44 AM] HTTPError: Response code 502 (Bad Gateway) at Request. (/usr/local/lib/node_modules/homebridge-ring/node_modules/got/dist/source/as-promise/index.js:118:42)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[2/17/2023, 4:54:44 AM] Got SIGTERM, shutting down Homebridge...
Environment
OSX and current versions of node, npm, homebridge, and homebridge-ring