Sync fails if access token is deleted.

jgaehring commented 4 years ago

As with #356, I'm not sure if this behavior would be different if an actual invalid token was present, but below is how I simulated such a state on the current develop build using Chrome browser.

Steps to Reproduce

Open the dev tools and navigate to the Application tab; open the site's Local Storage.
Right-click on the token's value column and select "Edit Value".
Replace the access_token with an empty string.
If testing in the dev environment, make sure to delete any cookies as well. (I should probably configure the proxy server so cookies don't get received either)
Sync logs.

Expected Result

Network requests should be stalled and enqueued once the first request fails, and only continue once a fresh token has been obtained. This way, the requests should go through as normal, as far as the user is concerned.

Observed Result

All requests except fail except two, which are chained together, and the user is redirected to the Login screen. Note the requests in the network tab (dev environment):

I'm still trying to figure out exactly the sequence of events that lead to that pattern, but as far as I can tell, it's the request to taxonomy_vocabulary.json that must be triggering the token refresh procedure (the request to taxonomy_term.json?vocabulary=3&&page=0 only happens after the first one has completed). You can see the token request in the middle there, but what's really peculiar is the timing displayed in the waterfall graph. First of all, you can see that there are duplicates of all the requests (with the exception of the chained request). The first of each pair of duplicate requests are fired off together, in parallel, they all fail, and only then does the request to token initiate. That request succeeds, and then the second of each pair of duplicate requests are fired off, again in parallel, and again they all fail except the request to taxonomy_vocabularly.json. It's very peculiar and raises several questions:

Why are there duplicate requests?
Why is the taxonomy_vocabulary.json the only one that succeeds?
If it's the one initiating the refresh procedure, why it and not the first request to the log.json endpoint?
And also, why is a first request to taxonomy_vocabularly.json going through and failing before that procedure has completed?
Finally, why do the other requests fail, even after the token is refreshed?

Note that the timing is displayed differently when running in production:

One major difference is the presence of the OPTIONS requests, which are all succeeding (identifiable as having an initiator of "Other" and size of "0 B"). Apart from that, however, I realize the main difference is just the order in which requests are charted. If you walk through it, I believe the sequence of requests and their duplicates are still the same, with the first batch happening firing at once, all failing, then the token refresh, succeeding, and finally the second batch, all failing except one (and its chained request). Still, it's damned peculiar.

I should also note that after getting rerouted to Login, if you hit the back button to My Logs and try syncing again, all requests succeed.

paul121 commented 4 years ago

Wanted to add some quick thoughts that might help out here..

The idea is that once farmOS.js detects a 403 response it is assumed the token has expired. (It should also be able to detect if the timestamp has expired before making the request, but this can't always be assumed because the other case is that a token has been revoked on the server side). After a 403 is detected, it puts all requests that haven't been made into a queue. After a successful refresh, it retries all the requests in the queue. That should be the reason for the duplicate requests.

I think what's happening is that FK sends multiple requests to farmOS.js and are all sent out before the first one fails. farmOS.js handles all of these asynchronously. Still, when all of these requests fail they should be added to the same queue and wait for one single refresh to be made, before being retried.

What is weird is that only one of these requests is succeeding after the refresh. It might be due to weird timing, but this seems pretty consistent?

I think you're right about OPTIONS, they should appear in the same order. Good that we're seeing the same issue though, doesn't seem to be related to the OPTIONS. I do think we should be testing this all locally with farmOS running on a different domain rather than the proxy server. I don't think there's any way to get the options requests while using the proxy server :-/

I hadn't tested this by manually "nullifying" tokens. Instead I set the token expiration time on the server to be < 30 seconds. With that set, farmOS.js should detect that tokens have expired BEFORE sending a batch of requests and handle all of this much better. Manually messing with tokens on the client side is more of "unexpected" behavior, but seems like we need to make some changes to be more resilient!

jgaehring commented 4 years ago

Ah, ok. I guess I did wonder if this test is not a fair simulation of tokens expiring. I don't know if it's a test we should expect it to pass. Perhaps it would be better to test either by changing the settings on the server as you described, or by overwriting the token's expiration time from localStorage, instead of overwriting the token itself.

What is weird is that only one of these requests is succeeding after the refresh. It might be due to weird timing, but this seems pretty consistent?

Maybe I should try overwriting the expires property before assuming it's due to timing. Hang on...

jgaehring commented 4 years ago

Maybe I should try overwriting the expires property before assuming it's due to timing. Hang on...

Ok, I got some interesting results trying that. First the local dev env:

And the staging/production server:

Again, I think these results are more or less the same with the exception of the OPTIONS requests, only being displayed differently for the staging server, so I'm just going to refer to the dev env here.

It seems that now, the token request is being fired off first, as hoped, but still the first batch of double requests are failing except for the /log.json endpoint, which presumably is the one initiating the refresh cycle. Then a second token request is getting fired (that's weird), then the second batch of requests are fired, with only the /taxonomy_vocabulary.json enpoint succeeding (plus it's chained request) this time, so presumably that was the request that initiated the second batch.

So maybe the first step is to figure out what's going on that's causing duplicate requests, because that's bound to invalidate the tokens for other requests, I would think.

Also, I'm just noticing how weird the /log.json request is. It starts as soon as the other requests in that batch, after the first /oauth2/tokenrequest, but does not complete until well after the second /oauth2/request completes. So maybe that's just a fluke, due to slow network, and otherwise it would fail?

jgaehring commented 4 years ago

I'm wondering now if we need to get rid of the Promise.resolve() wrapper at line 131:

https://github.com/farmOS/farmOS.js/blob/14bab48c10e6c3b8b96067268b3c0747d9258a16/index.js#L127-L141

So it would be:

      .then(accessToken => ({
        ...config,
        headers: {
          ...config.headers,
          // Only add access token to header.
          Authorization: `Bearer ${accessToken}`,
        },
      })

Away from my computer today, so I can't test. I'm just thinking about how all our promises are all composed together, if it's throwing off our timing somehow, and looking at this, I wonder if it's acceptable to return a promise here instead of an actual config object.

jgaehring commented 4 years ago

Comparing with the axios docs that don't wrap the config in a promise:

https://github.com/axios/axios#interceptors

jgaehring commented 4 years ago

Ah ha! Ok, I think that might have worked. At least, the eventually went through, everything updated, and there was not reroute to the Login page. We still have duplicate requests though, and the first batch is still failing, except for one request, which is not duplicated. Here are the results:

Note that I'm also adding a console log statement to the subscribe fulfiller,

  // Call all subscribers.
  function onRefreshed(token) {
    console.log(`Continuing ${subscribers.length} requests...`)
    subscribers.forEach((cb) => { cb(token); });
  }

which is interesting, because this would show that the requests are being enqueued twice. In the first round, the /log.json endpoint seems to be triggering the refresh, so it's not enqueued, but all the other 4 requests are. Then in the next round all but 1 are again enqueued, although it's not immediately clear which one that is?

I'm still trying to parse this all out and figure out what's causing the duplicate requests. I may try this again with a console.trace() call to try to figure out exactly which requests are initiating the refresh and how other requests are being enqueued.

jgaehring commented 4 years ago

Ok, found the problem. I introduced an error when linting here:

https://github.com/farmOS/farmOS.js/pull/14/commits/98be3af411c468878647de565a80726ea439b24f#diff-168726dbe96b3ce427e7fedce31bb0bcL73-R77

Specifically, when I changed the name of token to newToken to avoid collisions with the upper scope, I forgot to change it for the call to onRefreshed(). That meant that when the new token came back, it was being passed to the first request, /log.json, but the old token was being passed to all the enqueued requests. So that batch would fail out, and then start another round of requests, the first of which would eventually succeed and the remaining of which would fail.

The fix is simple enough; we just need to change it from

onRefreshed(token.access_token);

to

onRefreshed(newToken.access_token);

I came across a few other things that might be issues down the road, such as the issue with the promise that I mentioned in the last comment, and another function that probably should be wrapped in a promise. I'll commit those to farmOS.js, publish them as 0.1.1, commit an update of the farmos dependency in our package.json, and merge that into develop so we can test these changes live on the staging server.

I also realized in the course of this that blocking cookies for GET requests via the proxy server is also preferable in the dev environment (in addition to POST and PUT requests), because this particular error doesn't occur if a cookie gets sent with a GET request in development, which wouldn't happen in production. So I'm gonna tack on the commits I have on the proxy branch right now, before the farmOS.js update, and merge all that into develop.

jgaehring commented 4 years ago

Pushed the change up to https://develop--dreamy-tesla-8069e3.netlify.app/. It's marked "Version: 0.4.19-hotfix1".

Appears to be working!

To reproduce the test, I run Date.now() in the browser console, copy the timestamp it provides, then paste it in to Local Storage via the dev tools. Then hit sync. You should see the token request go through first., followed by only one batch of requests (although note once more that the OPTIONS requests are present here too):

Closing.

farmOS / field-kit