Open m-natarajan opened 4 weeks ago
Triggered auto assignment to @kadiealexander (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.
Triggered auto assignment to @srikarparsi (AutoAssignerNewDotQuality
)
Making this external to see if there's a reliable way to reproduce, the root cause, and proposals to fix.
Job added to Upwork: https://www.upwork.com/jobs/~01f591c76409c7f4d0
Triggered auto assignment to Contributor-plus team member for initial proposal review - @rushatgabhane (External
)
This feels extremely easy to reproduce: just close your laptop lid for a few minutes, and reopen.
I just tried this (closing laptop and reopen) and this was my network tab:
2 pings were called which isn't as many as your screenshot here but it's still back to back pings which shouldn't happen.
We already have a check to make sure that we don't send a Ping command when one is pending and this seems to be working because I don't see [NetworkConnection] recheck NetInfo
in the console.
So I have two theories:
useRef
since they remain for the lifetime of the component?I'm still looking into these but they are my initial thoughts based on the code. cc @roryabraham and @adhorodyski if you have any additional thoughts since you guys worked on these PRs to introduce NetInfo and periodic checks.
@srikarparsi you're correct about the periodic check.
The call itself feels solid, as it should bail out if only the function early return kicks in (which from the logs looks fine, no subsequent recheck NetInfo
).
If hasPendingNetworkCheck
is reliable, this periodic check should cause us no harm (but that's an assumption).
On higher-level problem I see with this implementation is that's it's really, really imperative so it's easy to make a mistake and cause such a behaviour over time. Declarative APIs work better especially with React codebases and there are open source libraries to solve just that.
I created this PR to check if a network check is pending before starting a new one. Still need to test but I think this would be a quick way to stop repetitive calls. @adhorodyski if you could take a look at it as well that would be appreciated.
I also agree that our current implementation might not be the best way of doing it. NetInfo has parameters that we seem to be implementing in a custom way. For example, NetInfo already has reachabilityShortTimeout and reachabilityLongTimeout which are defaulted to 5s and 60s. So when the internet is not detected, it should be rechecking for connection every 5s. And when it is detected, it should be rechecking every 60s. But we had to re-add the 60s check in this PR so I think there might just be something wrong with our current implementation which we need to fix.
I wasn’t able to reproduce this issue by closing and opening the laptop lid. Every time I tried, the Ping
and ReconnectApp
methods were only called once.
However, based on the code and description provided, it appears that the problem is related to the way the app handles network checks and reconnections. When an app determines that it is online but cannot connect to the server, it initiates multiple Ping
and ReconnectApp
requests simultaneously. This leads to high amounts of network traffic and unfinished commands. Reconnect logic does not control or limit the number of reconnect attempts. This can be problematic in environments with poor network conditions, leading to a constant flood of network activity.
Given this, I think adding this additional check makes sense as it ensures that a new network check only starts if there isn't already one in progress.
The change from this PR seems to cause regression.
Overall, I agree with Adam that we should adopt a more declarative approach to handling network connections. Currently, we are using an imperative approach, which seems error-prone. For example, the recheckNetworkConnection
function is used both as middleware and in an interval, leading to risk of potential errors and multiple calls.
NetInfo provides built-in functions for re-checking the connection, such as reachabilityShortTimeout
, which runs every 5 seconds if the Internet is not detected, and reachabilityLongTimeout
, which runs every 60 seconds when the Internet is connected. These built-in mechanisms are designed to handle network rechecks reliably.
Given the complexity of our custom implementation, it's challenging to determine if the root cause of this issue is due to NetInfo or our custom logic. Therefore, maybe we should consider removing the custom recheckNetworkConnection
solution and relying solely on NetInfo's built-in functionality? This approach simplifies our codebase and leverages the library's tested and optimized features.
To ensure this change meets our needs, I’d suggest to double-check that it provides the required functionality.
Given the difficulty in reproducing the issue, I believe we should conduct thorough testing to ensure that NetInfo's built-in mechanisms handle all necessary scenarios and edge cases.
To achieve this, we need to confirm which specific functionalities we want to test and verify.
Here are some examples:
Therefore, maybe we should consider removing the custom recheckNetworkConnection solution and relying solely on NetInfo's built-in functionality?
I agree with this. And if it doesn't work and we verify that it's not a problem with our implementation, then I think it's better to make the fix upstream in NetInfo.
I think this should be the first step so I'll close this PR. @OlimpiaZurek let me know if I can do anything to help you with this.
Thanks for the update!
Reviewing
label has been removed, please complete the "BugZero Checklist".
The solution for this issue has been :rocket: deployed to production :rocket: in version 9.0.7-8 and is now subject to a 7-day regression period :calendar:. Here is the list of pull requests that resolve this issue:
If no regressions arise, payment will be issued on 2024-07-24. :confetti_ball:
For reference, here are some details about the assignees on this issue:
BugZero Checklist: The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed:
The PR that introduced the bug has been identified. Link to the PR: N.A. This was always there
The offending PR has been commented on, pointing out the bug it caused and why, so the author and reviewers can learn from the mistake. Link to comment: N.A.
A discussion in #expensify-bugs has been started about whether any other steps should be taken (e.g. updating the PR review checklist) in order to catch this type of bug sooner. Link to discussion: N.A.
Determine if we should create a regression test for this bug. Yes!
If we decide to create a regression test for the bug, please propose the regression test steps to ensure the same bug will not reach production again
1. Go offline
2. Go to network tab in browser
3. Verify that `openApp` isn't repeatedly called
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Version Number: Reproducible in staging?: needs reproduction Reproducible in production?: needs reproduction If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Expensify/Expensify Issue URL: Issue reported by: @quinthar Slack conversation: https://expensify.slack.com/archives/C05LX9D6E07/p1719023935665339
Action Performed:
Expected Result:
Shouldn't call
ping
andreconnectApp
several timesActual Result:
on a really bad wifi network, where it concludes it's online but for some reason can't contact the server, it just hammers
Ping
andReconnectApp
back to back, filling the network queue with tons of parallel unfinished commands.Workaround:
unknown
Platforms:
Which of our officially supported platforms is this issue occurring on?
Screenshots/Videos
bugd.txt
View all open jobs on GitHub
Upwork Automation - Do Not Edit
Issue Owner
Current Issue Owner: @kadiealexander