Closed kidroca closed 3 years ago
Triggered auto assignment to @kadiealexander (AutoAssignerTriage
), see https://stackoverflow.com/c/expensify/questions/4749 for more details.
maxRetries
it's a tell that something is really bad and we can signal another internal system to take over if (maxRetries == 0) {
const error = new Error('A request reached the max retries limit');
error.request = params;
onMaxRetries(params);
return Promise.reject(error);
}
The onMaxRetries
is out of the current scope, but the idea is that it will (for example)
offline
flagfunction post(params) {
// actual matching logic might be more complex than this
const pending = pendingRequests.find(r => _.equals(r.params, params));
if (pending) return pending;
const requestPromise = HttpUtils.xhr(params)
.then(response => onResponse(response))
.catch(error => {
onError(error);
return waitForRetry(1000).then(() => post(params));
})
// clean up the completed requests from our pending list
.finally(() => pendingRequests = pendingRequests.filter(r => r != requestPromise));
pendingRequests.push(requestPromise);
return requestPromise;
}
I don't know it's ever the case that an identical request is made at the same time, but this could be helpful for debugging
Triggered auto assignment to @alex-mechler (Engineering
), see https://stackoverflow.com/c/expensify/questions/4319 for more details.
cc @marcaaron and @tgolen since you worked a lot with the network queue early on. I remember we had reasons for going with a queue, rather than a direct approach like this, but they are escaping me atm. Do you remember?
The main reasons for using the queue (as far as I know) are to support the offline features. So, if you are offline the request you attempted to make is placed in the queue and then processed once you are back online.
That said, I can't really think of any benefit to having a write queue that is getting processed in a loop every seconds (even when empty). I have some other priorities to get to and can't get too involved atm, but @cead22 might also have some opinions here.
The main reasons for using the queue (as far as I know) are to support the offline features. So, if you are offline the request you attempted to make is placed in the queue and then processed once you are back online.
This is correct
but the interval will still run causing interrupts affecting the main thread
What are the real-world user facing consequences of this? afaik this doesn't slow anything down. In fact we check if we're online and if the network queue has anything in it and if not, we return early. I imagine this whole code runs in single-digit milliseconds if not a fraction of a millisecond
function processNetworkRequestQueue() {
// NetInfo tells us whether the app is offline
if (isOffline) {
if (!networkRequestQueue.length) {
return;
}
// If we have a request then we need to check if it can be persisted in case we close the tab while offline
const retryableRequests = _.filter(networkRequestQueue, request => (
!request.data.doNotRetry && request.data.persist
));
Onyx.set(ONYXKEYS.NETWORK_REQUEST_QUEUE, retryableRequests);
return;
}
// When the queue length is empty an early return is performed since nothing needs to be processed
if (networkRequestQueue.length === 0) {
return;
}
- I can't trace precisely when this happens, but the proposal below can detect this and allow us to do something about it
Can you reproduce?
- The promise is resolved when we're both back online and not paused
This logic is also built into the network queue
2. Adding a retry limit
This logic can be added to the network queue.
I'm not a main contributor to this repo, but I don't agree with the problem, so this refactoring doesn't seem worth it with the information I have so far.
Let's forget the performance implications
The current pattern is creating unnecessary complexity which get in the way of writing straightforward code:
Take for example pauseRequestQueue
and unpauseRequestQueue
, you'd think they will just pause/unpause the setInterval
but they aren't, because some requests still need to run even on a paused queue. This is again something that can be handled quite more gracefully with the promise pattern I've posted
When the queue is "paused" the short returns do not take effect and it constantly foreaches and re-queues the pending requests - thus the 200+ retry attempts I've posted about here: https://expensify.slack.com/archives/C01GTK53T8Q/p1623747094347300
setInterval
is dangerous, there are numerous topics about it and I would avoid it when a more straightforward alternative exists
There are also cases where a request will become unsolvable it will fail over and over and the queue will send it over and over in a endless loop I can't trace precisely when this happens, but the proposal below can detect this and allow us to do something about it
Can you reproduce?
No, I've seen it happen several times and others have reported similar issues, it might not be caused by the queue but a retry limit can help review the root issue
I've kind of been following this issue, and I went and re-read the original thread in Slack that spawned this issue.
I think this is chasing a lot of theoretical problems without getting crisp on the specific problem (this seems to be a repeating theme).
The original assumption was:
Many retries on startup are causing performance problems on Android
However, this was just conjecture and was only theoretical. Let's start here. If that's the hypothesis, then let's test it out and ensure that this is indeed the problem. Without proving that it's a problem (or the size of the problem), then chasing after solutions isn't going to benefit anyone.
Going to close this in favor of https://github.com/Expensify/Expensify.cash/issues/4026
It seems like we know there is some kind of regression with the network request queue, but have no clear reproduction. It also seems like the changes here might be a potential solution to that. But we should focus on the issue that has been reported rather than put the solution first. Thanks!
When we added Flipper bridgespy
I've noticed that all timers (setTimeout/setInterval) would make an inbound and outbound calls through the react native bridge, each time a timer runs
The more interval timers we have the more calls that happen through the bridge Ideally we want to reduce calls through the bridge as much as possible as that's the weak spot for React Native That might change when JSI becomes the norm and there's no bridge
Ideally we want to reduce calls through the bridge as much as possible as that's the weak spot for React Native
I've thought about this before as well, but can't really think of what problem the "bridge noise" is contributing to. It does make it kind of hard to spy the bridge traffic since the timers are relentlessly ticking. But what's bad about it?
I've thought about this before as well, but can't really think of what problem the "bridge noise" is contributing to. It does make it kind of hard to spy the bridge traffic since the timers are relentlessly ticking. But what's bad about it?
Imagine it's literally a bottle neck, it can allow only so much traffic as the width of the cap, when a lot of items move through the bridge, some would get batched to wait as there's not enough bandwidth, this creates lag This is because all the calls get serialised and deserialised to be passed to/from the native world The re-architecture of react-native aims to replace the bridge, or at least allow libraries like reanimated to communicate directly with the JS thread without the need of a syncing bridge
The network queue is not the only interval timer that we have, they take a hit on that bridge bandwidth every second
I'm not certain how can we measure the effect of this, the theory is that in moments of high intensity these extra calls would negatively affect performance, or maybe they add a small hit every second but it's not something noticeable
One other thing in this regard is there's no idle time, something is always running
If you've noticed timers on inactive browser tabs get throttled, I think one of the reasons was significant battery usage due to constantly running code I remember reading about this somewhere, but so far this is the only thing that comes up: https://stackoverflow.com/questions/11788928/how-much-battery-life-can-setinterval-suck-up
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Details
ATM network request queue runs relentlessly using a
setInterval
of 1 second Every second it will check are there any request and send them, there are long periods of time (e.g. 10 or more seconds) where no requests will be scheduled, but the interval will still run causing interrupts affecting the main threadhttps://github.com/Expensify/Expensify.cash/blob/1af601ee7947ce6b0027ace470cbe9ea406480bb/src/libs/Network.js#L169-L170
There are also cases where a request will become unsolvable it will fail over and over and the queue will send it over and over in a endless loop
Proposal
The code can be refactored in a way that will stop the queue from running when it's not necessary
post
performs the request, if it happens to fail, make the same request either after a fixed timeout -setTimeout
or after an event that gives the green lightHandling would become simpler as well as more performant. Here's the gist of it
The point is that besides making things simpler, we no longer have to use an interval timer and cause unnecessary interrupts every second, but only when we actually need to retry something
Making
waitForRetry
event driven: When we're offline or the queue is paused for some reason, we can capture a deferred promise, then we don't have to retry after a second, but when the deferred promise is resolved The promise is resolved when we're both back online and not pausedAdding a retry limit The
post
function can be setup to use amaxRetries
parameter that will help stop the recursion if we're getting into an endless retry cycle for some reasonWe start with a default value and decrement until we reach 0, if for some reason the request won't make it we stop retrying.
Related slack thread: https://expensify.slack.com/archives/C01GTK53T8Q/p1623747094347300
Platform:
Where is this issue occurring?
Web ✔️ iOS ✔️ Android ✔️ Desktop App ✔️ Mobile Web ✔️
Version Number: 1.0.69-0 Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation Expensify/Expensify Issue URL:
View all open jobs on Upwork