Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.51k stars 2.86k forks source link

Concierge message was triggered 3 times when creating a new account #4857

Closed isagoico closed 3 years ago

isagoico commented 3 years ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Create a new account
  2. Land in Concierge conversation

Expected Result:

Only one welcome message should be displayed

Actual Result:

3 welcome messages were displayed.

Workaround:

Can the user still use Expensify without this being fixed? Have you informed them of the workaround?

Platform:

Where is this issue occurring?

Version Number: 1.0.86-11 Logs: https://stackoverflow.com/c/expensify/questions/4856

Notes/Photos/Videos: Any additional supporting documentation

Accounts affected: kevin@kevinsullivan.one and kevin@979greenwich.xyz

image

Expensify/Expensify Issue URL:

View all open jobs on GitHub


From @kevinksullivan https://expensify.slack.com/archives/C01GTK53T8Q/p1629909971275600

Signing up with a new account triggered three of the same message from concierge

MelvinBot commented 3 years ago

Triggered auto assignment to @roryabraham (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

roryabraham commented 3 years ago

I think this will almost definitely need to be internal. Demoting this to weekly and dropping because I won't have time to look into it right away.

JmillsExpensify commented 3 years ago

Ran into this one in testing internally.

marcaaron commented 3 years ago

Not able to repro this on dev at least but I’ll try to see how it can happen.

@kevinksullivan just out of curiosity is there anyway you can log out and log back in and very that this chat still has the three auto-messages?

marcaaron commented 3 years ago

If this is getting created in the server I have a feeling it's a race condition + multiple requests to get or create the Concierge chat report and then triggering the welcome message right after so something like

  1. Try to get or create chat report - created
  2. Try to get or create chat report - not created but returned
  3. ''
  4. Get the report and try to do auto message - allows it because the report doesn't have comments yet
  5. ''
  6. ''

If those other requests happened after the comment was created then we'd be fine but they must be happening concurrently. We can try to get to the root of why several requests were made, but I'd guess it has to do with the reconnection callbacks triggering for some unexpected reason (normally harmless, but in this case bad).

One extreme solution would be to move this check + auto-message creation into Auth where we can guarantee that at least one reportAction exists. I'm not too sure that's practical though given how much logic exists in PHP for sending the auto-message.

Another idea is to detect if we are trying to create the auto message in a Auth and just prevent it from being created. Probably the easiest way is to add a flag to the reportActions row message JSON (e.g. isAutoMessage). And if we are trying to create this particular message but already have some comments then we should not allow creating it again.

Maybe there is a better way to solve this...

kevinksullivan commented 3 years ago

Still have the three messages @marcaaron

image

marcaaron commented 3 years ago

Cool. So, I haven't been able to repro this + I'm not actually sure the best way to handle it. So gonna leave this unassigned, but hopefully there's enough context for someone to run with this.

marcaaron commented 3 years ago

What I'd probably try next is to simulate a reproduction by:

  1. Creating a new account via NewDot + validating the account locally
  2. Line up 3 fetchAllReports calls instead of just one here https://github.com/Expensify/App/blob/a919ee8008071fb1f3964417047fc82a59d4fc74/src/libs/Navigation/AppNavigator/AuthScreens.js#L167
  3. Sign in
  4. Check to see how many messages were created.
kevinksullivan commented 3 years ago

Adding another scenario that just came up on a new account:

kevin@citybiking.us

image

MelvinBot commented 3 years ago

Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

marcaaron commented 3 years ago

Alright so I still can't reproduce this but have some new information.

Initially I was sold on the theory that it happens because we are calling fetchAllReports() multiple times. I tested this out by just copying and pasting it a bunch of times and then setting a password on an account that has never signed in via NewDot and was able to emulate what would happen if we did call this method multiple times... and sure enough got multiple comments from Concierge.

However, this doesn't really explain how or why this API is getting called multiple times and I don't think it's actually going down like that.

Looked a bit deeper and noticed that if we have an "inaccessible report" we also try to create the concierge report as a side effect of navigating to it here.

This could potentially cause multiple attempts to create the concierge report in a short period of time (before it exists).

Based on what I'm seeing in logs like this - I'm pretty sure we are dealing with the latter.

marcaaron commented 3 years ago

Haha so this one just keeps getting more fun :) Have been scanning the logs looking for why a report might be inaccessible and then noticed that the particular report Kevin tries and fails to access with kevin@citybiking.us is actually the group DM between myself, Kevin, and Mitch.

Which makes me think that this is very likely caused by old data sticking around in Onyx. Here's how it might go:

  1. When we go to get the initial params for the report page we must be trying to access this old report from some stuck data that did not get cleared properly
  2. We try to load this report here and since it's not accessible we redirect to the Concierge report. But this report doesn't exist yet (when logging in for the first time we've already asked to create it, but this happens in the blink of an eye).
  3. We won't yet have a Concierge report so we end up here
  4. Which will make a request to create the report but not actually fetch the report and then we merge a "partial" report in here and also navigate to that report and wait for it to show up
  5. Setting the partial report should trigger a handleReportChanged call here and an attempt to fetch the "real" Concierge chat report - once it downloads we should see it.

That's just one idea though. This code added here also looks suspect to me since when we navigate to / it's possible we will try to "handle the inaccessible report". Which seems undesirable in cases where we are logging in for the first time.

marcaaron commented 3 years ago

Hmm so I'm able to reproduce this on production (very inconsistently) and verified via the Network tab that we are only calling CreateChatReport once - so I think all my guesses are actually wrong and something else is happening in the API layer.

Looked in the logs and see the CreateChatReport is processing twice. Not totally sure how this can happen but going to ask in infra.

Logs 1: https://www.expensify.com/_devportal/tools/logSearch/#query=request_id:(%2268baec060a4a35f7-LAX%22)+AND+timestamp:[2021-09-08T19:28:13.142Z+TO+2021-09-08T21:28:13.142Z]&index=logs_expensify-012458

Logs 2: https://www.expensify.com/_devportal/tools/logSearch/#query=request_id:(%2268baec05fa3535f7-LAX%22)+AND+timestamp:[2021-09-08T19:28:13.150Z+TO+2021-09-08T21:28:13.150Z]&index=logs_expensify-012458

marcaaron commented 3 years ago

Chatting with @bondydaa about this now.

We took a look at the logs together for a bit and agree it seems like a single http request is somehow getting split into two requests. Still not too sure how this is possible.

An interim solution we discussed is to return whether the chat report was created for the first time via CreateChatReport and then just return early if it's not the first time we are creating the Concierge report.

fukawi2 commented 3 years ago

Adding myself as an assignee to pull CF logs: https://expensify.slack.com/archives/C094TGUTZ/p1631139987025400

marcaaron commented 3 years ago

FWIW, the interim solution is pretty solid when testing locally and blasting the API with a zillion CreateChatReport requests it only ever creates one comment so I think we can maybe pursue that for now while we get to the bottom of how the duplicate request happens.

fukawi2 commented 3 years ago

Well I finally figured out how to pull the logs for the "rayids" (hint: you need to drop the IATA code from the end of the rayID - ie, remove -LAX from these ones).

They don't really reveal much though :/

image

bondydaa commented 3 years ago

if i'm reading this right then that means cloudflare also saw this as multiple requests so that makes me think something on the clients is somehow triggering the request twice but it's still odd that when marc has been able to reproduce dev tools isn't showing multiple requests.

I'm about to merge the changes in auth which will help fix to address this at the db & web-e layers so at least we can try to methodically figure out what the heck is going on while preventing the problem for users.

marcaaron commented 3 years ago

if i'm reading this right then that means cloudflare also saw this as multiple requests

I don't know anything about cloudflare, but curious to understand more about what happens when a client makes a request and how it's intercepted. Why would those logs mean that cloudflare got multiple requests from the client?

marcaaron commented 3 years ago

Are there other proxys that could introduce something like a "duplicate request" or is this not possible at all?

bondydaa commented 3 years ago

My last comment was based on the fact that the the IP is the same, the URI is the same but the EdgeStartTimestamp are different even though close. Couple that with the fact the rayIDs match up to what our logs show as 2 different requests.

marcaaron commented 3 years ago

Is it possible that some other proxy split the client request into multiples before passing to cloudflare?

fukawi2 commented 3 years ago

Unassigning myself - tag me if there's anything else I can do to help.

kevinksullivan commented 3 years ago

Gave this a fresh try on kevin+4@kevinsullivan.one this morning and got two messages.

image

Catching up on your comments from the last few days @marcaaron and this one stood out:

Have been scanning the logs looking for why a report might be inaccessible and then noticed that the particular report Kevin tries and fails to access with kevin@citybiking.us is actually the group DM between myself, Kevin, and Mitch. Which makes me think that this is very likely caused by old data sticking around in Onyx.

I am finding chats from other accounts popping up in my account a lot, which seems closely tied to this report fetching you mention, and probably another reason why I can reliably get this issue to occur, since I am constantly in and out of accounts with existing chats. See this issue for more detail https://github.com/Expensify/App/issues/5034

marcaaron commented 3 years ago

I am finding chats from other accounts popping up in my account a lot, which seems closely tied to this report fetching you mention, and probably another reason why I can reliably get this issue to occur, since I am constantly in and out of accounts with existing chats.

Thanks @kevinksullivan! There could be something there. I was never able to reproduce this issue by injecting report data into my storage while signed out so it was a dead end for me - but there could be more than one reason why these API requests are happening in multiples so I wouldn't rule it out yet.

MelvinBot commented 3 years ago

@marcaaron Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

MelvinBot commented 3 years ago

@marcaaron Huh... This is 4 days overdue. Who can take care of this?

marcaaron commented 3 years ago

Temporary solution is on staging now. I think once that's done we can drop the priority on this a bit.

marcaaron commented 3 years ago

I think since further improvements here won't materially affect a user's experience we can punt them for now. I never got to the bottom of how multiple requests were being made, but we did stop the creation of duplicate comments so I think we can close this and focus on higher priority things.