Open tsightler opened 2 years ago
I've played with this a little more and verified the behavior. If the call to oauth server returns anything other than a success, then the code just immediately assumes that the current refresh token isn't valid and sets it to null, which means all future requests also fail. My belief is that this should be the behavior only if the initial connection has never been successfully established.
In the interim I've figured out various ways to deal with this from the application side. One option is to trap the exception using a default exception handler and deal with it there, for example, but triggering an API disconnect/reconnect. Another option is to just monitor for any case where the refresh token is set to undefined and, when detected, force the value back to last known saved token and clear the _authPromise, which will force a re-authenticate on the next attempt.
So the question is really, do you consider such failures something the API should recover from on its own or not. I've personally only seen this failure twice this year, but other users have reported it more often.
I implemented the following crude solution in ring-mqtt back in Oct:
// Check for invalid refreshToken after connection was successfully made
// This usually indicates a Ring service outage impacting authentication
setInterval(() => {
if (this.client && !this.client.restClient.refreshToken) {
debug(colors.yellow('Possible Ring service outage detected, forcing use of refresh token from latest state'))
this.client.restClient.refreshToken = this.refreshToken
this.client.restClient._authPromise = undefined
}
}, 60000)
Code just checks once a minute for the case where the refreshToken in the restClient has been set to null and just forces it back to the last known good refreshToken and clears _authPromise so that ring-client-api code will try again for a session token on the next request. This has eliminated reports of failures due to short Ring outages.
Bug Report
Describe the Bug
Hi @dgreif,
This may not really be a bug, maybe just a design decision, and I could definitely deal with it on the application side, but, as I was researching this behavior, it really seemed like a bug to me, so I thought I would document it here just in case.
What I've noticed is that, if Ring has any temporary issue where requests fail with authentication errors (for example, on Oct 5th, there was an ~1hr service outage where requests were failing with authentication errors) that the API never recovers from this, it just keeps attempting to poll (for example, due to the background camera polling) and returning this error:
This message is incorrect as the refresh token was still valid. For example, if I just restart, using the most recently saved refresh token, the authentication will work, so clearly the token is still valid.
In researching this I believe it has to do with how the getAuth() function treats a failure to acquire an access token. If I understand the flow (and it's possible I don't), whenever any request fails with a "401 Unauthorized" error, the most recent authentication response is immediately cleared (_authPromise set to undefined) and a new authentication attempt is made via await on authPromise, which itself calls getAuth(). Normally getAuth() uses the current refresh token to POST to https://oauth.ring.com/oauth/token and get an updated authentication response with a new access token. However, during any type of temporary service outage, the POST to https://oauth.ring.com/oauth/token may fail, and, if I'm following the logic correctly, the catch then immediately clears the existing, active refresh token.
At this point, all future requests are doomed to fail since there is no longer a valid refresh token in the runtime environment. Calls to REST API continue, each one awaiting for authPromise, which calls getAuth(), which calls getGrantData(), however, since this.refreshToken is now null, getGrantData() will always fall through to the throw and the message noted above will be logged. Once this.refreshToken is null I don't see any way the API could ever recover, it might as well exit at that point as far as I can see.
I'm not 100% sure what should be done here, but it feels to me that a previously working refresh token should not be invalidated immediately on a failure, however, perhaps I'm supposed to be doing something to catch this behavior?
To Reproduce
It's really difficult to reproduce since it requires Ring to reject the current access token and refresh token, but I could probably write up a code example that simulates the behavior by directly manipulating the access and refresh tokens if really needed.
Expected behavior
IMO the API should recover from a temporary service outage without requiring this to be handled on the application side, no different than a network outage.
Screenshots/Logs
Just for the sake of example, here's an example of the logs showing it failing every 20 seconds (due to polling cameras), note that this is after the service was restored:
Additional context
Obviously, this is only an issue when there are unexpected authentication service outages. It's not clear to me how to tell a service failure vs an actual expired refresh token, but, realistically, if the code has made an initial connection using the refresh token provided, and from that point is constantly maintaining the token, then it seems reasonable to expect that this token becoming suddenly invalid is a service failure vs a real authentication failure, although I of course do know there are potential cases where a working token may be invalidated, for example, if a device is removed from allowed devices.
I've noticed that the Ring app seems to recover from this just fine, so it seems like ring-client-api should be able to deal with this as well.
Environment