Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.56k stars 2.9k forks source link

[Tracking] Investigate IOU commands running to same ID collision #13738

Closed mountiny closed 10 months ago

mountiny commented 1 year ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

We have seen that for CreateIOUSplit, CreatePolicy, CreateIOUTransaction, SendMoney and SendMoneyWithWallet, we run into error which signals that the same request has been sent multiple times while the first one succeeded and we tried to create a row in the database with the same id, which fails.

Internal issues for the command failures:

Reproduction steps for PayWithWallet: https://github.com/Expensify/App/pull/12646#issuecomment-1359880911

It might need a high-traffic account.

We can expect the reproduction steps are similar to pay with wallet

  1. Force user to be offline
  2. Send money/ Request money from someone you have chatter before (and you have the report loaded)
  3. The IOU Preview should show greyed out automatically
  4. Go online
  5. Confirm the IOU preview is not greyed out and you dont see any error

Expected Result:

Describe what you think should've happened

No error shown if the action succeeded.

Actual Result:

Describe what actually happened

There is an error https://user-images.githubusercontent.com/4741899/208730156-b71e9f0e-082c-42d5-9415-43cbd86dfe01.png, but it is not right, after refresh the message is shown

Workaround:

Can the user still use Expensify without this being fixed? Have you informed them of the workaround?

Everything works fine after refresh, the error is confusing UI though.

Platforms:

Which of our officially supported platforms is this issue occurring on?

Version Number: Reproducible in staging?: Yes Reproducible in production?: Yes Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation Expensify/Expensify Issue URL: https://user-images.githubusercontent.com/4741899/208730156-b71e9f0e-082c-42d5-9415-43cbd86dfe01.png Issue reported by: @yuwenmemon Slack conversation: here https://expensify.slack.com/archives/C03TQ48KC/p1671559412497409 and here https://expensify.slack.com/archives/C03TQ48KC/p1671549525089169

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~017cdd21c4f565ed9a
  • Upwork Job ID: 1605311650796871680
  • Last Price Increase: 2022-12-20
melvin-bot[bot] commented 1 year ago

Triggered auto assignment to @alexpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

melvin-bot[bot] commented 1 year ago

Job added to Upwork: https://www.upwork.com/jobs/~017cdd21c4f565ed9a

mountiny commented 1 year ago

Hey @thesahindia sorry nothing to do right here, this will require internal investigation.

Keeping you assigned @alexpensify as we might need someone to pay out down the line

mountiny commented 1 year ago

havent had time to look today

alexpensify commented 1 year ago

@mountiny - I'm going OOO until January 3 and will catch up then to identify if there is any movement here. If you need to issue a payment next week, please reassign the issue to another member on Bug Zero who can help with the Upwork process. Chat with you in 2023!

mountiny commented 1 year ago

Havent been able to reproduce reliably locally but we know this is happening in staging/production so it can be a variation of the network state/ size of a policy/chat potentially.

alexpensify commented 1 year ago

I'm still OOO and plan to review next week when I'm back online.

mountiny commented 1 year ago

No problem, there was no traction on this one, I have also been ooo

mountiny commented 1 year ago

Other priorities

alexpensify commented 1 year ago

Thanks for the update @mountiny-- safe travels!

mountiny commented 1 year ago

Not super high priority not causing issues to clients.

cead22 commented 1 year ago

@mountiny I looked at the logs for https://github.com/Expensify/Expensify/issues/248926 and found the following

The call to the CreateIOUSplit auth command is passing "iouReportID":"8824324229232110"

Bedrock\Client - Starting a request ~~ command: 'CreateIOUSplit' clusterName: 'auth' headers: '[authToken: '' chatReportID: '2441593069105568' amount: '500' splits: '[{"email":"jatinsonidev+010@icloud.com","amount":250},{"email":"jatinsonidev@icloud.com","amount":250,"iouReportID":"8824324229232110","chatReportID":"7391743059865632","transactionID":"3795588431536333239","reportActionID":"2152671604058566207","clientID":"1674226326525232"}]' currency: 'INR' comment: '' transactionID: '5881579255060231882' reportActionID: '2626203818684704724' createdReportActionID: '0' clientID: '1674226326522371' isFromGroupDM: '1' idempotent: '' logParam: 'jatinsonidev+010@icloud.com' commitCount: '18324566281' requestID: '78c8a1d12ad63386-LAX' lastIP: '157.38.110.253' writeConsistency: 'ASYNC' priority: '500' timeout: '290000']'

We log the following before the failure

Creating original transaction for IOU Split with ID 5881579255060231882 Creating initial IOU split action for reportID 2441593069105568 CreateReport::process reportID 8824324229232110

And then we see the error, notice the same reportID 8824324229232110

{SQLITE} Code: 1555, Message: abort at 25 in [INSERT INTO reports ( reportID, created, state, accountID, reportName, currency ) VALUES ( 8824324229232110, ...

So I think what's happening is:

We the log the following, and call reportID = auth.report.createWithAction passing newReportID=8824324229232110

Creating a new IOU report for 14251014 and shared with 12924254

Can you double check if I traced this correctly? if I'm right this isn't a collision, it's a bug

cead22 commented 1 year ago

I tried splitting a bill with @mountiny and couldn't reproduce

cead22 commented 1 year ago

I submitted a PR to add some logging that I think could be helpful

mountiny commented 1 year ago

@cead22 Thank you for looking into this. I do follow your flow and seems correct to me.

However, I am not sure if I follow where there is a bug? Can you elaborate on that please? In either case more logs will be helpful to figure this out.

To be clear I dont think we are actually getting collisions, I think there is some bug, but I dont think I understood where you are pointing at here.

cead22 commented 1 year ago

In short, my theory is that we're calling CreateIOUSplit with iouReportID set to an existing reportID, which we end up passing down all the way to CreateReport::process which fails because that reportID already exists in the database -- it's not one that was generated in the client but that hasn't been saved to the database yet.

That said, I tried reproducing and couldn't. I'm not sure if it was because I couldn't trigger the API call SplitBill, I was only able to trigger SplitBillAndOpenReport (I don't know if they're different)

mountiny commented 1 year ago

SplitBill

You can trigger split bill if you SplitBill from the existing group chat (not from the Green + button)

I think I might now understand what you mean, I think we had a bug which was similar to this in payWithWallet just now, where if you Paid money request, we did not set the iouReportId to null for the other clients, if that makes sense. Here is a PR.

So then if the other user would want to request money we would reuse the report ID, I think.

We should look if this is covered correctly in the other flows

mountiny commented 1 year ago

Updated the issue so this reflects that this is more a tracking issue for all the collision IDs. Still investigating.

alexpensify commented 1 year ago

Thanks for the update here @mountiny

cead22 commented 1 year ago

@mountiny here are logs for requests that failed with UNIQUE constraint failed: reports.reportID. None of them show the new log line that we added Generated random reportID:, and for all of them, if you search for the repotID the error line 'read/write transaction', query failed with error #19 (UNIQUE constraint failed:..., you'll see that the reportID was passed as a param to each of the API commands. So I'm more convinced that we're just holding it wrong in these flows

mountiny commented 1 year ago

Thanks for getting back to this! I think the solution to this might be in the request retry logic:

cead22 commented 1 year ago

Why do you think it's the retry logic? I'm not saying it isn't but I can't think how this would happen, because we should only retry if a request fails, but in this case the first request wouldn't fail, so why would we replay those requests?

mountiny commented 1 year ago

Ah I dont really know what specifically is broken in the logic, what I meant is that the only way this is happening is that the client send a request with the same parameter multiple times, even if the first one succeeded, right? Thats is what we are seeing, we have already processed the request, in database we saved the action with particular reportActionID or report with reportID, but a request with the same params seems to come through causing the collision.

mountiny commented 1 year ago

Still on a backburner

cead22 commented 1 year ago

No updates from me either

alexpensify commented 1 year ago

Any updates here or still a lower priority?

mountiny commented 1 year ago

We are also finishing the IOU migration to reportAction ids now so maybe we will see some improvement because of that

alexpensify commented 1 year ago

Thank you for the update!

mountiny commented 1 year ago

no updates from me

alexpensify commented 1 year ago

Thank you for the update here

alexpensify commented 1 year ago

No update

alexpensify commented 1 year ago

ECX has been the focus for the full team-- no update

alexpensify commented 1 year ago

@mountiny and @cead22 - any update here?

mountiny commented 1 year ago

Havent had time to investigate, its a lower priority now

JmillsExpensify commented 1 year ago

Should we open this back into the pool? The main reason I ask is that it appears like this is still happening a decent amount via the linked BugBot reports, right?

JmillsExpensify commented 1 year ago

Or actually, should we still hold anymore as we're about to finish refactoring IOUs anyway as part of Manual Requests?

mountiny commented 1 year ago

I dont think the manual requests will change in a big way if this is happening. I still believe this is mostly related to the queue and retry logic. I am happy to leave this for someone else to do a deep dive, I have not had time lately.

Maybe @cead22 will want to continue exploring

JmillsExpensify commented 1 year ago

Ok cool, thanks! Let's see what @cead22 thinks and has bandwidth for.

mountiny commented 1 year ago

N7 priority

MelvinBot commented 1 year ago

⚠️ Looks like this issue was linked to a Deploy Blocker here

If you are the assigned CME please investigate whether the linked PR caused a regression and leave a comment with the results.

If a regression has occurred and you are the assigned CM follow the instructions here.

If this regression could have been avoided please consider also proposing a recommendation to the PR checklist so that we can avoid it in the future.

mountiny commented 1 year ago

No update from my side

alexpensify commented 1 year ago

@mountiny - I hope you have been enjoying the SF office, checking in if there is an update here. I'm pretty sure there are other priorities but checking in.

mountiny commented 1 year ago

Not much progress its not blocking users now so not a priority but I can see we are still hitting these errors from time to time

alexpensify commented 1 year ago

I think Lounge and other GHs have been a priority, but correct me if I'm wrong.

mountiny commented 1 year ago

Definitely I think we can make this monthly with Tasks, Threads and Manual requests right now, this does not seems to be causing issues to customers right now.

alexpensify commented 1 year ago

Perfect, thank you for updating to a monthly.

alexpensify commented 1 year ago

No update here, but flagging that I'm OOO until next Wednesday.

mountiny commented 1 year ago

Thanks, I think this one is still in a backlog, its not causing issues afaik

alexpensify commented 1 year ago

Thank you for the update @mountiny

alexpensify commented 1 year ago

@mountiny and @cead22 - any news here?