Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.36k stars 2.78k forks source link

[HOLD for payment 2022-02-02][$8000] - iOS - User stuck on Splash screen when launching the app #5620

Closed isagoico closed 2 years ago

isagoico commented 2 years ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Force close the app or update the app
  2. Open the app

Expected Result:

User should not be stuck on Splash screen

Actual Result:

User is stuck in splash screen

Workaround:

Closing and reopening app fixes the issue.

Platform:

Where is this issue occurring?

Version Number: 1.1.2-0

Reproducible in staging?: Yes Reproducible in production?: Unknown

Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos: Any additional supporting documentation There's no solid reproduction steps but from the testers reproductions it has been after updating the app.

Expensify/Expensify Issue URL:

Issue reported by: @puneetlath Slack conversation: https://expensify.slack.com/archives/C01GTK53T8Q/p1632958388281500

View all open jobs on GitHub

MelvinBot commented 2 years ago

Triggered auto assignment to @iwiznia (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

MelvinBot commented 2 years ago

Triggered auto assignment to @kadiealexander (External), see https://stackoverflow.com/c/expensify/questions/8582 for more details.

kadiealexander commented 2 years ago

Upwork post: https://www.upwork.com/jobs/~018ca1e70adf6f82c2

NikkiWines commented 2 years ago

Hmm, I think we need more consistent reproduction steps for this one. I haven't been able to reproduce this on my iOS device when following the steps listed above.

I just tested updating the app from v1.1.7-4 to v1.1.7-16 and was directed to my chats page immediately after updating, no hanging on the splash screen.

puneetlath commented 2 years ago

This happens to me almost every time I open the app when it has been a little while since the last time I opened. Let me know if there's something I can do that would be helpful.

mallenexpensify commented 2 years ago

I don't think there's an easy way to capture logs from the app, I asked a while ago and created this SO https://stackoverflow.com/c/expensify/questions/10396

iwiznia commented 2 years ago

It depends on what you need, but some logs are sent to the server and you can see them there. No JS errors I think, but our logs that the app generates.

iwiznia commented 2 years ago

https://stackoverflow.com/c/expensify/a/11099/3

thienlnam commented 2 years ago

I also experience this most of the time I use the app.

I send some texts, leave it running in the background, and then the next time I try to open the app I get stuck on this screen.

I always end up closing the app from the background and then relaunching

Device: iPhone Pro 12

mallenexpensify commented 2 years ago

@NikkiWines @iwiznia @thienlnam are any of you able to look at my logs via the below to find more details on the issue?

If you want the logs generated by our app, you can get them in logSearch by searching blob:"userAgent" AND blob:"New%20Expensify"

Since this is happening to multiple users and it keeps people from successfully entering the app, I think we should prioritize the issue.

I doubled the price of the job

iwiznia commented 2 years ago

I could but I need the email and exact time frame

mallenexpensify commented 2 years ago

@iwiznia I'm hung up on iOS, 1:41 Eastern time, 10/22, mallen@expensify.com . Can you take a look at my logs?

iwiznia commented 2 years ago

Does not seem too revealing 😞

kadiealexander commented 2 years ago

Doubled job price to $1000.

isagoico commented 2 years ago

@joaniew Is experiencing this issue too

Slack thread: https://expensify.slack.com/archives/C01GTK53T8Q/p1635255533107600

my iOS app (latest version) is stuck on the open screen even after multiple exit/opens Image from iOS

Julesssss commented 2 years ago

Linking this existing issue for reference, which has some extra info about which version/devices have seen the issue.

kadiealexander commented 2 years ago

Updated job price to $2000.

sidferreira commented 2 years ago

I also experience this most of the time I use the app.

I send some texts, leave it running in the background, and then the next time I try to open the app I get stuck on this screen.

I always end up closing the app from the background and then relaunching

Device: iPhone Pro 12

Still having this issue?

kadiealexander commented 2 years ago

@thienlnam please see above :)

thienlnam commented 2 years ago

Still having this issue?

Yup, still happening for me - but I've noticed it can happen on both data and wifi

sidferreira commented 2 years ago

@thienlnam I have a branch with bugsnag installed, trying to reproduce it (no success so far). Would you be able to build and run from that branch, so I try to get some extra data?

thienlnam commented 2 years ago

Anything you're looking for specifically? We have crash logs and other logs already. This also doesn't happen every time I launch so I wouldn't necessarily have the logs each time I tried to run the app

sidferreira commented 2 years ago

@thienlnam as a contributor, I don't have access to that logs, also Crashlytics is not as rich as Bugsnag IMHO. Finally, I added some breadcrumbs based on what I believe may be the issue

thienlnam commented 2 years ago

I'd rather not build and run a WIP branch from my personal phone. Are you able to write a proposed solution regarding what you believe is the issue? Then @NikkiWines can work with you to refine / develop the solution and we can test those changes when the PR gets to staging

sidferreira commented 2 years ago

@thienlnam (in order to keep this short) can we discuss it in slack?

thienlnam commented 2 years ago

Feel free to start a conversation in the #expensify-open-source slack channel

Guidlines

sidferreira commented 2 years ago

@thienlnam Let's use https://expensify.slack.com/archives/C01GTK53T8Q/p1635255533107600 that's an already existing thread

thecodingeek commented 2 years ago

Findings and Proposal

Hello everyone,

I tried to look into the causes for this issue. Below I am sharing my findings and a proposal for one reason.

As per Expensify.js file

if (this.getAuthToken() && this.props.initialReportDataLoaded && this.props.isSidebarLoaded) {
    this.hideSplash();
}

I think there are very less chance for not getting auth token, so it can most probably happen because either initialReportDataLoaded is false or isSidebarLoaded is false.

One very interesting thing which I could find was, if in the file SidebarLinks.js in the render function the recentReports comes out to be an empty array in the following code,

const {recentReports} = getSidebarOptions

then the Splash Screen never goes away. Reason is when recentReports is empty then there are no sections generated and <OptionsList is not rendered.

Now this <OptionsList has the following code

<OptionsList
    ....
    onLayout={App.setSidebarLoaded}
/>

This method App.setSidebarLoaded is supposed to do the following,

Onyx.set(ONYXKEYS.IS_SIDEBAR_LOADED, true);

which makes isSidebarLoaded = true.

Now as the OptionsList view did not render so onLayout of the same is never called and isSidebarLoaded never becomes true. Hence it gets stuck at the Splash Screen.

An alternative (or solution) here I think can be to put onLayout={App.setSidebarLoaded} on the parent <View> so that even if the recentReports is empty even then it will hide the Splash Screen and move further. Right now it just stuck there if the recentReports is empty.

If the recentReports is actually coming out to be empty then that can be a separate issue. But I think the above handling is good to have in the code so that it never hangs at the Splash Screen even if such thing happens.

Next, the variable initialReportDataLoaded is set to true in the file Report.js in the method fetchAllReports

Onyx.set(ONYXKEYS.INITIAL_REPORT_DATA_LOADED, true);

So this scenario can happen if this fetchAllReports API fails, but may be that is also less likely to happen. If this is the scenario then we can do error handling there and at least move user forward with a popup or something.

If you can share the user's credentials for which this happens then I can find out the exact reason and fix even that issue as well.

@NikkiWines I would love to know what you think about this. Please check when you're available.

tgolen commented 2 years ago

@thecodingeek That's a good investigation. I think your instincts are right about the problem with isSidebarLoaded. I looked at some of my server logs this morning to see if I could find any issues with fetchAllReports. I found that the first time I opened my app today, the request to fetch reports failed (server logs):

Got an authorization error Expensify\Error\Auth\AuthTokenExpired - 9b953676-e64d-4262-bf74-1bf0ff888e4d ~~ command: 'Get' exceptionJsonCode: '407' exceptionUserMessage: 'Your session has expired. Please sign in again.' exceptionMessage: '' exceptionFile: '/git/releases/expensify.com/d38cb61/lib/Auth.php' exceptionLine: '242' exceptionCode: '666'

image

In the case of a 407, those should be handled in the app here and it should re-authenticate the user.

Now, in this particular case, I THINK the re-authentication code is supposed to keep track of the original promise, so that after reauthentication is successful, the original promise is resolved. It's possible that this is not working as intended though and should be investigated further.

I am almost positive now that this issue is caused by an expired authToken when launching the app. AuthTokens expire after 2 hours, which makes this time-consuming to test, but you could also try hardcoding a 407 response into the code (with something like response.jsonCode = 407 in order to force the condition and test what happens with the code and the promises.

About 6 seconds after the first request for chatList, I do see in the logs that another request for chatList is made that is successful: image

but it seems possible that this second request isn't resolving the promise on the front-end (because I left the app open for about 15 seconds without the splash screen ever going away.


It's also worth noting, that with the expired authToken, the Pusher lib also fails to authenticate (which would help to explain the issue with the notification badges being out-of-date in some cases).

parasharrajat commented 2 years ago

@tgolen Now the question is why does this only affect IOS.

mallenexpensify commented 2 years ago

Doubled price to $4000 https://www.upwork.com/jobs/~018ca1e70adf6f82c2

mallenexpensify commented 2 years ago

We are placing all non-critical issues on hold while we're on a company offsite. The hold is expected to be lifted on 11/19 (but could go longer). For any PRs that are submitted before or during the hold, we will add a $250 bonus payment.

kidroca commented 2 years ago

One more reason to be stuck on the splash screen is a "soft" crash captured by our <ErrorBoundary /> It has been happening to me on several occasions to be stuck at splash because something extraordinary happened - when an exception occurs inside a render method Mostly this have been happening by calls to Str. methods with undefined e.g. take a method like Str.isSMSLogin(credentials.login) if by accident it's called with undefined it will cause a crash

In that case we log an error and render null for the app

https://github.com/Expensify/App/blob/0a0eba37e50220b2b10623ac6c3102c3fdcc8fa1/src/components/ErrorBoundary/BaseErrorBoundary.js#L35-L46

Maybe for future help in distincting these problems e.g. did something just "soft" crashed or we're stuck for some other reason we can hide the splash screen and render something helpful from the ErrorBoundary instead of null

E.g. a brief explanation that "it's very bad but we can fix it" and display a button to either clear store which would take us to login again, or relaunch the app, though relaunching might not be as reliable as clearing storage...

mallenexpensify commented 2 years ago

Here's an interesting one that might be related?

A square version of the splash screen with visible parts of the app viewable above and below.

Not sure what I was doing when it happened, I might have enabled vertical-only orientation view in between opens of the app

kadiealexander commented 2 years ago

Still hunting the right proposal for this one!! Job is currently at $4000 (thanks for updating, Matt!)

kadiealexander commented 2 years ago

Upping the price to $8k!

@NikkiWines would you mind taking a look at the above suggestions (and discussions here and here) to see if there's anything we can work with?

kidroca commented 2 years ago

Hi, I have been tracking this issue for some time and would like to help move it forward

I've investigated everything written so far and I think the most likely cause is a JS error halting code execution from ever reaching BootSplash.hide()

Here are some facts that will help us dismiss some other possibilities

  1. We were logged in and had a functioning app (The bug happens only after a restart / update)
  2. When there's no app update the bug usually happens after 2 hrs of inactivity (expired token)
  3. The splash screen is initially visible and would only hide when we call BootSplash.hide from our code

It's not a problem of Expensify.js componentDidUpdate or fetchAllReports

From fact 1) we know we have enough information in storage to load without network at all. It's only if our storage was somehow corrupted that we can be stuck here. Expand to see more details... https://github.com/Expensify/App/blob/c3ba59bac53b65d00616dc779eea077825026b4d/src/Expensify.js#L113-L115 From fact 1) we know that `this.getAuthToken()` is `ture` and that `props.initialReportDataLoaded` is `true` as well - `initialReportDataLoaded` is set to `true` after login and remains true ever since - nothing would ever set it to `false` `isSidebarLoaded` would be set to `true` very soon, the outcome of `fetchAllReports` doesn't matter - again from fact 1) we have data in on disk so even if `fetchAllReports` fails completely we have something to load in the LHN - LHN uses reports - they are not safe for eviction, nothing would remove them from storage automatically. Only report actions can be auto evicted, but that doesn't concern the LHN and the `isSidebarLoaded` flag Regarding _"the outcome of `fetchAllReports` doesn't matter"_, I've tried - skip calling it - make it do nothing but return a rejected promise - make it receive an empty list reponse - make it receive a `jsonCode` other than 200 The splash screen would still hide Regarding `SidebarLinks` and `App.setSidebarLoaded` not getting called - yes this could happen but only if something wiped out the data in storage - it's unlikely - if full data is wiped we'd be on the Login screen. Could data be partially wiped? I've traced that as well and will post findings separately, overall it doesn't seem likely

It's likely an error halting code execution

Sync Errors that are thrown inside component code like render, componentDidMount would prevent any further code being executed

As of React 16, errors that were not caught by any error boundary will result in unmounting of the whole React component tree. https://reactjs.org/docs/error-boundaries.html#new-behavior-for-uncaught-errors

If such an error happens at launch before we hide the splash screen, our ErrorBoundary component will stop render the app (it's children) and no code will be able to call BootSplash.hide()

https://github.com/Expensify/App/blob/c3ba59bac53b65d00616dc779eea077825026b4d/src/App.js#L35-L37

https://github.com/Expensify/App/blob/c3ba59bac53b65d00616dc779eea077825026b4d/src/components/ErrorBoundary/BaseErrorBoundary.js#L39-L47

If this ever happened the ErrorBoundary component would be sending very distinct logs containing more information. It would be very helpful to post them here so that we can zoom in on something specific Such logs would be prefixed with [alrt] NewExpensify crash caught by error boundary and will contain error - a message, and errorInfo - a component stack trace

In case the Error Boundary theory is wrong

We need to take some measures to pinpoint the problem without having to analyze the whole app

1 Add some code to the <Expensify /> component

    componentDidMount() {
        this.debugBootsplashIssues();
    }

    debugSplashIssues() {
        setTimeout(() => {
            BootSplash.getVisibilityStatus()
                .then((status) => {
                    if (status === 'visible') {
                        Log.alert('[BootSplash] splash screen is still visible a minute after launch', this.props, false);
                    }
                });
        }, 60 * 1000);
    }

Hopefully the logs around this alert here would be able to tell us what's going on

It might be better to have this outside of Component code in case there's an error that's preventing React to initialize - perhaps put it in src/libs/BootSplash/index.native.js

2 Let the <ErrorBoundary /> render something

Make the Error boundary hide the splash and render some fallback content in case it catches an error This would let people know where they've encountered a startup error, or they're stuck due to something else It can be configured to be ENV specific - render more debug information in test flight and dev, and something more user friendly for production

kidroca commented 2 years ago

My proposal is let's add the code above and have more clues about what's causing this - if it turns out it reveals the real problem and we fix it - great, if not I would accept a reduced payment (5-10%) and others would still benefit from the information and might solve the ticket

kidroca commented 2 years ago

Having corrupted data in storage, or partially cleared storage

If for some reason on disk we have authToken or the means to successfully reauthenticate, but we've lost data regarding reports, and the fetchAllReports failed in a way that even the Concierge chat is not created Then yes we could be getting stuck on the splash screen because App.setSidebarLoaded is not getting called

I haven't found anything definitive on the matter. All AsyncStorage issues I've investigated would be wiping out full storage data, which if happened in our app would render us on the Login Screen This issue suggest it could be a problem if we use 2 versions of AsyncStorage at the same time: https://github.com/react-native-async-storage/async-storage/issues/257 A library/dependency we use can be depending on the deprecated AsyncStorage that's still bundled in v0.64.1 of react-native - doesn't seem likely.

We're certainly not deleting reports from storage ourselves. The only time we clear storage is when the user is taken back to the Login Screen

This is pretty far fetched, but I've noticed that when we resume the app with an expired token multiple Authenticate requests are being made Multiple Authenticate and Push Authenticate requests

If one of them fails it would unpause the queue and redirect the user to sign in which clears storage, but in the meantime another Authenticate call could succeed which renders the user logged in again - so now we have cleared storage but have an auth token to be considered logged in. Now if for some reason fetchAllReports or fetchPersonalDetails would fail we would be stuck on the splash. Some more thoughts on the subject: https://expensify.slack.com/archives/C01GTK53T8Q/p1637086224241700


I agree with @thecodingeek that we should move onLayout={App.setSidebarLoaded} to the root View if it's ever the case that the code here would produce an empty Array

https://github.com/Expensify/App/blob/c3ba59bac53b65d00616dc779eea077825026b4d/src/pages/home/sidebar/SidebarLinks.js#L193-L198

If sections[0].data is empty the onLayout callback would not be called

It would be much better to see an empty Sidebar but have access to the Create Menu or settings, than to be stuck on the splash screen

This could happen after login where fetchAllReports fails to deliver

NikkiWines commented 2 years ago

Thanks for the investigation @thecodingeek (for your investigation earlier) and @kidroca. Given the evasive nature of this issue, I think adding logs to componentDidMount() (and componentDidUpdate()) seems like a good idea. A minute might be a bit long though, we really shouldn't be seeing the splash screen for more than a couple of seconds.

I'm not opposed to rendering something for ErrorBoundary, I'm not sure if there's a reason we've opted not to render something before now, but if we configure it to be env specific, maybe we should just render something in testflight + dev only and avoid prod for now.

I agree with @thecodingeek that we should move onLayout={App.setSidebarLoaded} to the root View if it's ever the case that the code here would produce an empty Array

I also think this is a good idea as well, since having it OptionsList now leads to the splash screen staying up if we force an empty array and we can't know for sure that the same behavior isn't happening in prod.

FWIW, re:expired authtokens being the cause of this, I've tried (unsuccessfully) to reproduce this bug with by setting authtokens to expire in 10 seconds (instead of 2 hours) locally. The app on the iOS sim boots up correctly without issue every time during my testing.

NikkiWines commented 2 years ago

Since there's a couple of suggestions in the works here, I've broken down @thecodingeek and @kidroca's suggestions into two separate issues. This issue will be going on hold in the meantime.

The aforementioned issues are here and here and will have associated UpWork jobs momentarily.

@kidroca, @kadiealexander and I discussed with the team about compensation for helping to identify the source of the bug and decided on a base compensation of $250, with first dibs on implementing a solution for this (and getting the full issue payment) should the logs return anything material that helps to point to a solution.

kidroca commented 2 years ago

Any luck finding logs starting with

[alrt] NewExpensify crash caught by error


The reason we render nothing for the error boundary is we didn't know what to render We only added it to capture and log errors but it could be handy at least for QA to let them now the app failed at startup

tgolen commented 2 years ago

I looked at the logs for that, and it looks like there is a fair amount of errors getting caught:

image image

tgolen commented 2 years ago

I wonder if we should be creating BugBot GH issues for these (cc @iwiznia). It seems like any error that would get caught by this is something that could/should be investigated and fixed.

tgolen commented 2 years ago

FWIW, I experience this issue a lot, and when I filtered those logs for only my email address, I didn't get any logs attached to my account in the past 30 days. So, the error boundary is working, but maybe for this problem with the splash screen, it could mean that it's unrelated to an uncaught error (unless the logging is wrong for some reason or there is another reason why my errors aren't logged).

NikkiWines commented 2 years ago

Hahaha @tgolen beat me to it but yes, so there are instances of logs beginning with [alrt] NewExpensify crash caught by error boundary in the last month but I'm not seeing anything tying it directly to this bug.

For one, I experienced this bug this morning and there are no logs beginning with that line associated with my email in the last 30 days. Joanie reported experiencing this bug in mid-october here but I also don't see any of those logs for her email.

image

image

Here's an example of those logs for your email actually @kidroca on 11-17

2021-11-17 13:05:48 117 | expensify@kidroca.com | [info] Making API request ~~ command: 'Push_Authenticate' type: 'post' shouldUseSecure: '' userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
2021-11-17 13:05:48 117 | expensify@kidroca.com | [info] Making API request ~~ command: 'Report_GetHistory' type: 'post' shouldUseSecure: '' userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
2021-11-17 13:05:48 117 | expensify@kidroca.com | [info] [NetworkConnection] stopListeningForReconnect called ~~ parameters: '' userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
2021-11-17 13:05:48 117 | expensify@kidroca.com | [info] [PusherConnectionManager] disconnected event ~~ parameters: '' userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
2021-11-17 13:05:48 117 | expensify@kidroca.com | [hmmm] [Pusher] Attempted to unsubscribe or unbind from a channel, but Pusher-JS has no knowledge of it ~~ channelName: 'private-report-reportID-71955477' eventName: '' userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
2021-11-17 13:05:48 117 | expensify@kidroca.com | [alXt] NewExpensify crash caught by error boundary ~~ error: 'Some error occurred inside render...' errorInfo: '[componentStack: '
2021-11-17 13:05:48 118 | expensify@kidroca.com | host:8080/app-98732bc83f218e0d0e25.bundle.js:214833:24)
sidferreira commented 2 years ago

@tgolen @NikkiWines Just mentioning my previous effort and interest in this issue. I'm up to pay for bugsnag for a couple of months until the investigation finishes.

Bugsnag breadcrumbs At https://github.com/sidferreira/App/commit/fa0f594545bc228cc833224183b8decf54708af6 can see the breadcrumbs exactly around the areas mentioned by other developers, but a couple of days before.

https://github.com/sidferreira/App/tree/issue_5620 (a fork with bugsnag I did couple of weeks ago)

https://github.com/Expensify/App/issues/6358#issuecomment-973659678

https://github.com/Expensify/App/issues/6357#issuecomment-973660741

https://expensify.slack.com/archives/C01GTK53T8Q/p1635878140359900?thread_ts=1635255533.107600&cid=C01GTK53T8Q

kidroca commented 2 years ago

Thanks @tgolen @NikkiWines Regarding the following

I wonder if we should be creating BugBot GH issues for these (cc @iwiznia). It seems like any error that would get caught by this is something that could/should be investigated and fixed.

Here's an example of those logs for your email actually @kidroca on 11-17

If we have logs that would produce GH issues we should disable sending them from DEV, similarly to how we don't send performance traces from DEV

My log Some error occurred inside render... is an Error I intentionally throw from Report Screen render I'm not sure why the information is cut short here: errorInfo: '[componentStack: ' - I definitely saw the full errorInfo logged from Log.alert locally

In the end it seems we're stuck for something other than a critical error I hope the added logging would review the real reason


@sidferreira bugsnag seems like something that could be useful but is a change that would need a separate discussion and approval from the team. Some points

sidferreira commented 2 years ago

@kidroca I think we can keep both at the same time, and if by any reason some part can't we can give priority to Log for sure. Account holding wise, my initial proposal was to @tgolen use a custom build for a while. If we send to production, I totally agree about transferring the account ownership. In Slack I proposed sharing access to Expensify team members, but the ownership makes more sense indeed.

And the main reason I like Bugsnag is cause we can add a lot of extra information, including navigation, that may affect the problem and helping us to understand the sequence of events.

NikkiWines commented 2 years ago

If we have logs that would produce GH issues we should disable sending them from DEV, similarly to how we don't send performance traces from DEV

BugBot should only create issues from staging or production logs.

And the main reason I like Bugsnag is cause we can add a lot of extra information, including navigation, that may affect the problem and helping us to understand the sequence of events.

While BugSnag does seem helpful, I agree with @kidroca that implementing it would require a larger, separate conversation given that we have an existing logging solution. cc: @tgolen, not sure if you have any strong opinion on this.