Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
2.99k stars 2.5k forks source link

Feature: Display backend unreachability message #38377

Closed tienifr closed 4 days ago

tienifr commented 2 months ago

Details

When user is offline, we display offline message (You appear to be offline.), when user is online but our backend is unreachable, we display We might have a problem. Check out status.expensify.com message.

Fixed Issues

$ https://github.com/Expensify/App/issues/37565 PROPOSAL: https://github.com/Expensify/App/issues/37565#issuecomment-1976463123

Tests

Network Devtools **Web**: Open DevTools >> Network **Native**: Toggle RN dev menu by `CMD + D` >> Open Element Inspector >> Network
Block network request **Chrome** Open DevTools >> More tools >> Network request blocking >> Enable network request blocking >> Add network request blocking pattern as `https://dev.new.expensify.com:8082/api` Screenshot 2024-03-15 at 19 27 27 Screenshot 2024-03-15 at 19 29 51 **Safari** 1. Open DevTools >> Sources >> + >> Local Override... 2. Press + next to Local Overrides >> Select `Block` type and URL as `https://dev.new.expensify.com:8082/api` with Regular Expression enabled Screenshot 2024-03-18 at 17 40 34 Screenshot 2024-03-18 at 17 47 51 **Native** Hard-code the reachability URL [here](https://github.com/Expensify/App/blob/f3be10027c4d786662253794c2cc7831170baf2a/src/libs/NetworkConnection.ts#L88) to an invalid URL.
  1. In Network devtools, verify that Ping command is called every 60 seconds (see Network Devtools)
  2. Block https://dev.new.expensify.com:8082/api request to make backend unreachable (see Block network request)
  3. Verify that after a while,We might have a problem. Check out status.expensify.com. message appears and the status page URL can be opened

  4. Go offline
  5. Verify that You appear to be offline. message appears
  6. Go online
  7. Verify that offline message disappears
  8. Verify that after a while, the unreachability message appears

  9. Disable network request blocking as in Step 2
  10. Verify that after a while, the unreachability message disappears

Offline tests

Same as Tests

QA Steps

NA

PR Author Checklist

Screenshots/Videos

Android: Native https://github.com/Expensify/App/assets/113963320/d59975d5-4940-453d-b3e7-01e1dde6a50a
Android: mWeb Chrome https://github.com/Expensify/App/assets/113963320/237f3e56-4f0b-49e4-8f9e-bb24edc1a854
iOS: Native https://github.com/Expensify/App/assets/113963320/27b329da-ba37-4b48-8498-de2cc2f09fe0
iOS: mWeb Safari https://github.com/Expensify/App/assets/113963320/ce09149a-4a2b-49c7-8d87-0874fd00ff92
MacOS: Chrome / Safari https://github.com/Expensify/App/assets/113963320/f53f2698-2099-4778-adbd-0bf657381198 Screenshot 2024-03-18 at 18 53 18
MacOS: Desktop https://github.com/Expensify/App/assets/113963320/f53f2698-2099-4778-adbd-0bf657381198
melvin-bot[bot] commented 2 months ago

@cubuspl42 Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

tienifr commented 2 months ago

Currently I found no way to manually block specific network requests on native apps the way we did on web with the support of DevTools. My only solution was to hard-code the reachability URL to make it fail.

cubuspl42 commented 2 months ago

@tienifr If there's no better way, we can test it this way on Native. In rare cases, we test things by applying a small code change. Please specify this technique in the "Tests" steps.

tienifr commented 2 months ago

@cubuspl42 I updated all the comments and replied to your feedbacks above.

tienifr commented 1 month ago

@cubuspl42 I extracted local subscribeToBackendReachability and updated minor comments as suggested.

cubuspl42 commented 1 month ago

I started testing, it's working very well so far 🙂

cubuspl42 commented 1 month ago

Okey, we've got an issue.

Steps:

Expected result (observed on main): "You appear to be offline"

Actual result: "We might have a problem (...)"

cubuspl42 commented 1 month ago

I think that "offline" means for us:

The second check doesn't seem to work now on mobile Native.

cubuspl42 commented 1 month ago

@tienifr Bump!

tienifr commented 1 month ago

Normally net-info should use the default config as per here to check for internet connectivity but not sure why it didn't work. I'm taking a look.

cubuspl42 commented 1 month ago

reachabilityUrl

The URL to call to test if the internet is reachable. Only used on platforms which do not supply internet reachability natively or if useNativeReachability is false.

You could dig more here. Does "native reachability" include any effective Web endpoints, or does it rely on user-triggered Wi-Fi/Cellular switches only? Maybe iOS uses some apple.com endpoint and tries to do some "effective Internet reachability", but it failed in my case? What about Android? Maybe it's an emulator-related thing?

Also, maybe we could change our definition of "online" on mobile Native, so "effectively offline" (being in a train with close-to-zero cellular reception, being connected to a LAN-only Wi-Fi router, etc.) is considered "online" by Expensify. But this would require Slack discussion and I think we should consider it plan B (or plan C).

tienifr commented 1 month ago

This is what I found on Apple Developer here.

Screenshot 2024-03-29 at 17 58 10

That means as long as the device is connected to any network, it's considered "reachable".

I searched over the Internet and people all suggested to write a custom "ping" to a high-availability endpoint. Thus I think we should disable the useNativeReachability config so it will always use the lib's default reachabilityUrl.

Give me some time to investigate if disabling it caused any side-effects.

cubuspl42 commented 1 month ago

should disable the useNativeReachability

I'm not excited by that!

You can give it a try, but I'm afraid we'll miss big potential of immediate system feedback when we know that the device is offline (e.g. wifi = off, cellular = off).

It appears that NetInfo gives us a choice: either use a built-in endpoint reachability loop or use the built-in native reachability integration.

I believe that the latter (native reachability integration) is a higher-value functionality.

I can see two viable options, I'll leave the decision to you:

When I say "home-made Google endpoint reachability loop", I mean re-using the code we created for the backend endpoint reachability loop.

cubuspl42 commented 1 month ago

Start a Slack discussion, mentioning the pragmatic user-observable behavior difference on Native

But remember that that means we'll blame Expensify in the "offline in a train" scenario!

The user will be effectively offline (very low reception), the system will report "user is online", the backend will be unavailable, so we'll go with "Expensify has a problem" communication.

tienifr commented 1 month ago

@cubuspl42 I spent time digging into react-native-netinfo source code and tried reproducing the internet reachability issue many times. Here're what I found:

  1. I cannot reproduce the issue. I unplugged optical cable from my router and kept the wifi connected. But I always get You appear to be offline. Also note that you cannot disable Mac's wifi to simulate no internet condition because iOS simulator shares the same network instance/info with your Mac. If you disabled Mac's wifi, the NetInfo's isConnected in the simulator would be false. I was a bit subjective when I received your feedback about the bug so I did not try to reproduce it.
  2. react-native-netinfo always uses the default NetInfoConfiguration to check for internet reachability when the platform does not support it (internet reachability) natively (iOS in this case and iOS only, Android can check for internet access itself). Highlighted in the docs here. I also saw the high-availability endpoint being fetched in Network debugger.
Detailed explaination for 2️⃣ iOS native call to `SCNetworkReachability` [does not return `isInternetReachability`](https://github.com/react-native-netinfo/react-native-netinfo/blob/a965795e60445d0e9cb7a16cc50547ede1855f53/ios/RNCNetInfo.m#L117-L121) (only `isConnected`). Thus in the logic [here](https://github.com/react-native-netinfo/react-native-netinfo/blob/a965795e60445d0e9cb7a16cc50547ede1855f53/src/internal/internetReachability.ts#L153-L160), `_setExpectsConnection` is called with `isConnected = true`. Later in that function, we triggered the check interval [here](https://github.com/react-native-netinfo/react-native-netinfo/blob/a965795e60445d0e9cb7a16cc50547ede1855f53/src/internal/internetReachability.ts#L56-L64) and [here](https://github.com/react-native-netinfo/react-native-netinfo/blob/a965795e60445d0e9cb7a16cc50547ede1855f53/src/internal/internetReachability.ts#L73) with the [default configuration](https://github.com/react-native-netinfo/react-native-netinfo/blob/a965795e60445d0e9cb7a16cc50547ede1855f53/src/internal/defaultConfiguration.ts) to check for internet reachability.

In conclusion, I have proof that netinfo still run the reachability check interval on iOS. The issue, if it may happen, can be because the interval time of the lib's and our own check are the same (i.e. 60 seconds) and thus may cause some kind of race condition.

So can you please give it another try? If the issue was no longer reproducible we could continue.

cubuspl42 commented 1 month ago

Thank you for diving deep into this! I'm sorry if I provided some misleading information. I misunderstood the "which do not supply internet reachability natively" part of the NetInfo docs, which might've biased me.

cubuspl42 commented 1 month ago

Android emulator, Native build, after macOS Wi-Fi is turned off:

image

Expected: "You are offline" Actual: "We might have a problem"

cubuspl42 commented 1 month ago

Please correct me if I'm wrong.

tienifr commented 1 month ago

Check today.

tienifr commented 1 month ago

I tested on Android device and here's what I found: Android only checks for internet reachability on initiation. That means if the network lost internet access anytime after the connection initialization, we wouldn't know.

I think that we should add another check for a highly available endpoint when the backend is unreachable. OR we can set another interval polling for that highly avaiable endpoint as your suggestion. Let me do some tests.

tienifr commented 1 month ago

@cubuspl42 This is ready for review again.

cubuspl42 commented 1 month ago

Android only checks for internet reachability on initiation. That means if the network lost internet access anytime after the connection initialization, we wouldn't know.

I'm a bit lost... But didn't we depend on NetInfo regular URL endpoint checks in the old solution (the current main), when we provided it with a custom endpoint URL? Also on Android?

tienifr commented 1 month ago

Android supports internet reachability natively, it won't use the URL test. So the issue also happens on main (turning off the Internet does not show You appear to be offline).

cubuspl42 commented 1 month ago

Okey, I gave it a lot of thought. I have some reservations about the current code, but I like the direction we're going in.

I sketched something that I'd consider an improvement over the current code, both in the terms of code organization and behavior in corner cases. I haven't tested it though, it's just a dump of my idea.

type Subscription = () => void;

type InternetReachabilityStatus =
    'backendReachable' | // Backend is reachable, which must mean the Internet is reachable
    'backendUnreachable' | // Backend is unreachable, but the Internet is otherwise reachable
    'internetUnreachable'; // Backend is unreachable, neither is the whole Internet

type NetworkConnectivityStatus =
    'disconnected' | // Network is disconnected, which implies the Internet is unreachable
    'connected';  // Network is connected, but the Internet might still be unreachable

type AppNetworkStatus =
    'offline' | // We assume the device is not effectively connected to the Internet
    'backendDown' | // We assume that the device is connected, but the backend is down
    'online'; // We assume that the network connection is healthy and the backend can be reached

// Cross-platform
function subscribeToInternetReachability(
    callback: (status: InternetReachabilityStatus) => void,
): Subscription {
    throw new Error("TODO: Implement using custom loop and one-off requests");
    // Basically what we have now, but let's keep the fallback logic to check high-availability endpoint when backend is
    // unreachable _on all platforms_. I'm suggesting making this the only source of truth about Internet being fully
    // unreachable. This way, we'll have no race conditions between the backend loop and the Internet loop, also
    // we'll minimize the number of unnecessary requests, as backend being reachable implies Internet being reachable.
}

// Web
function subscribeToNetworkConnectivityStatus(
    callback: (status: NetworkConnectivityStatus) => void,
): Subscription {
    // No-op, as we don't have any access to the network connectivity status on the Web
    return () => {
    };
}

// Native
function subscribeToNetworkConnectivityStatus(
    callback: (status: NetworkConnectivityStatus) => void,
): Subscription {
    throw new Error("TODO: Implement based on NetInfo subscriptions / isConnected");
    // Let's drop any NetInfo reasoning about the Internet reachability (as opposed to local network connectivity status,
    // like cellular / Wi-Fi being on/off). Why? Well, it doesn't work well on Android (it seems), and although it might
    // work on iOS, it still won't be performed at the perfect moment. So maybe let's just ignore it, and assume that
    // our source of truth for this information is as good as the one in the system?
}

// Cross-platform
// Subscribe to the real app network status, the offline forcing can be implemented on top of this
function subscribeToAppNetworkStatus(
    callback: (status: AppNetworkStatus) => void,
): Subscription {
    let internetReachabilityStatus: InternetReachabilityStatus = 'backendReachable';
    let networkConnectivityStatus: NetworkConnectivityStatus = 'connected';

    const establishAppNetworkStatus = (): AppNetworkStatus => {
        switch (networkConnectivityStatus) {
            case "disconnected": {
                return 'offline';
            }
            case "connected": {
                switch (internetReachabilityStatus) {
                    case "backendReachable":
                        return 'online';
                    case "backendUnreachable":
                        return 'backendDown';
                    case "internetUnreachable":
                        return 'offline';
                }
            }
        }
    };

    const triggerCallback = () => {
        callback(establishAppNetworkStatus());
    };

    const unsubscribeFromInternetReachability = subscribeToInternetReachability((status) => {
        internetReachabilityStatus = status;

        triggerCallback();
    });

    const unsubscribeFromNetworkConnectivity = subscribeToNetworkConnectivityStatus((status) => {
        networkConnectivityStatus = status;

        triggerCallback();
    });

    return () => {
        unsubscribeFromInternetReachability();
        unsubscribeFromNetworkConnectivity();
    };
}
cubuspl42 commented 1 month ago

What do you think?

cubuspl42 commented 1 month ago

@tienifr Bump! I know it's a tough task, and there are some hairy corner cases, but please provide some update 🙂

tienifr commented 3 weeks ago

subscribeToInternetReachability ... let's keep the fallback logic to check high-availability endpoint when backend is unreachable on all platforms

subscribeToNetworkConnectivityStatus ... Implement based on NetInfo subscriptions / isConnected

By these points, do you mean we implement our own logic for internet, connectivity and backend reachability and ignore netinfo logic?

I myself think the current logic is more readable and understandable. It's just simple: Backend check >> Internet check (if backend failed). We don't have redundant requests, only one Ping every 60 seconds and one high-availability fetch if Ping failed. But I appreciate your suggestion to use enumeration for the states anyway 👍. I'll update it.

DylanDylann commented 3 weeks ago

Taking over as C+

trjExpensify commented 2 weeks ago

Conflicts to resolve here! Assigned you as the reviewer @DylanDylann.

DylanDylann commented 2 weeks ago

~@aldo-expensify The change looks good to me. Could you help to add "ready to build" label for testing?~

Whoops, I can test on the emulator

DylanDylann commented 2 weeks ago

@tienifr BUG: Flicker when going online. When going online the indicator message turns into We might have a problem. Check out status.expensify.com a moment before disappearing

https://github.com/Expensify/App/assets/141406735/d1923f36-d326-408b-8a40-311bcdcfcd91

tienifr commented 1 week ago

Thanks @DylanDylann I found the root cause and solution for this, but need time to retest the flow on all platforms. I'll push the update today.

tienifr commented 1 week ago

@DylanDylann That issue happened when the BE was previously being unreachable, then when we turned back online, the networkStatus is unknown causing isOffline to be false:

https://github.com/Expensify/App/blob/8383d80013cf61e83fc8bd90eef5f8357f69cf91/src/hooks/useNetwork.ts#L32-L33

Now isOffline = false and isBackendReachable = false causing the We might have a problem. to show. Later, we had the logic to set isBackendReachable to true when we turned back online here and that message disappeared.

My solution is that if network status is unknown, we should treat it as if we're online and backend is reachable here. That's what we already did with isOffline as mentioned above.

DylanDylann commented 1 week ago

Reviewing today

DylanDylann commented 1 week ago

Reviewer Checklist

Screenshots/Videos

Android: Native https://github.com/Expensify/App/assets/141406735/003154d0-aa70-4021-84d4-841c3531d858
Android: mWeb Chrome https://github.com/Expensify/App/assets/141406735/6430086d-62db-4e43-aeca-a13312af36ff
iOS: Native https://github.com/Expensify/App/assets/141406735/01819e87-2b79-4439-84cd-71702e6a222e
iOS: mWeb Safari https://github.com/Expensify/App/assets/141406735/cd7ee98a-866d-446b-8ff7-52ee5eb3dd9b
MacOS: Chrome / Safari https://github.com/Expensify/App/assets/141406735/e75a8877-222e-41b7-a5f0-b700f0afdb33
MacOS: Desktop https://github.com/Expensify/App/assets/141406735/5396f7d0-07d0-4d1d-bc00-164cae869e8a
tienifr commented 4 days ago

Hi @aldo-expensify, merge freeze is over, I think we're good to proceed this.

OSBotify commented 4 days ago

:hand: This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release.

OSBotify commented 3 days ago

🚀 Deployed to staging by https://github.com/aldo-expensify in version: 1.4.74-0 🚀

platform result
🤖 android 🤖 success ✅
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 success ✅
🕸 web 🕸 success ✅