Open lanitochka17 opened 1 month ago
Triggered auto assignment to @RachCHopkins (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.
I don't understand what's happening in that video. I can't replicate this.
I'm on 2 bars of 4G.
Is the person reporting it saying that the two users other than themselves are, in fact, the same user?
Please clarify the "actual result"
Waiting on a response/clarification from QA team.
team is still able to reproduce
https://github.com/user-attachments/assets/44e432c6-24e5-4553-abba-0aab81c6dcb9
Ok, to sum up, it can be reproduced and we don't know what/who the third user is.
Job added to Upwork: https://www.upwork.com/jobs/~021843049310841478139
Triggered auto assignment to Contributor-plus team member for initial proposal review - @parasharrajat (External
)
Edited by proposal-police: This proposal was edited at 2024-10-07T12:12:00Z.
After sending the message, the chat turns into a group with duplicate users. It only corrects itself back to a single chat after switching to another chat and then returning to it
When OpenReport
API is called, the success data is added to queuedOnyxUpdates
and waited for all write requests to be complete before merging it to Onyx
We've had the logic to flush the queue after all requests are complete here
But in this case after AddComment
API is called, the queue is paused
After GetMissingOnyxMessages
API, the queue is active again but the queue is empty then the queuedOnyxUpdates
isn't flush which causes the participant of the DM to have both optimistic and current participants of a user.
unpause
function, we should flush the queuedOnyxUpdates
if there is no request . We should update after this line. I notice that we should only flush the
queuedOnyxUpdates
if onlynumberOfPersistedRequests
is0
because if we merge these data to Onyx when there are some pending requests in the queue that can cause the race condition since other requests can update the data for the same key of the previous request.
if (numberOfPersistedRequests === 0) {
flushOnyxUpdatesQueue();
}
GetMissingOnyxMessages
API is called I think maybe it's expected from BE. But after all, the frontend problem can happen in other cases and we need to fix this and it also fixes this issue.
On slow networks, if we create a report and comment on it, a group with a duplicate chat is created.
For write requests, we do not write the response onyxData and the corresponding successData/failureData/finallyData to Onyx directly. Instead, we write them to queuedOnyxUpdates to prevent the replay effect. https://github.com/Expensify/App/blob/78e484f41e5fb5fa3f8827d2b4aea0ba55b581e4/src/libs/actions/OnyxUpdates.ts#L30-L32 These updates are flushed only after all the PersistedRequests are cleared. https://github.com/Expensify/App/blob/78e484f41e5fb5fa3f8827d2b4aea0ba55b581e4/src/libs/Network/SequentialQueue.ts#L173-L175 We remove a request from persistent requests only after the response is received and processed with all middlewares. https://github.com/Expensify/App/blob/78e484f41e5fb5fa3f8827d2b4aea0ba55b581e4/src/libs/Network/SequentialQueue.ts#L90-L99 So, on slow networks, when we call OpenReport, its response and corresponding request’s successData, etc., are added to queuedOnyxUpdates. This cannot be flushed immediately because there was already an AddComment request added to the PersistedRequests. For some reason that needs to be fixed, the backend sends a previousUpdateID for AddComment that is larger than the previous lastUpdateID/lastClientUpdateID. This makes doesClientNeedToBeUpdated true. https://github.com/Expensify/App/blob/78e484f41e5fb5fa3f8827d2b4aea0ba55b581e4/src/libs/Middleware/SaveResponseInOnyx.ts#L34 As a result, the AddComment response is not processed directly with OnyxUpdates.apply. It sets this AddComment response to saveUpdateInformation, and the queue is paused. When the queue is paused, the updates waiting in queuedUpdates are not flushed, so OpenReport’s successData waits in the queuedUpdates, and the optimistic user detail is not removed, making the report appear like a group report.
DeferredOnyxUpdates.getMissingOnyxUpdatesQueryPromise()?.finally(finalizeUpdatesAndResumeQueue).then(() => QueuedOnyxUpdates.flushQueue());
@parasharrajat, @RachCHopkins Whoops! This issue is 2 days overdue. Let's get this updated quick!
@parasharrajat do you like any of the proposals here?
📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸
@RachCHopkins I won't be available from 16 Oct for a few days, please reassign.
I can take over this issue as C+
@parasharrajat @RachCHopkins this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
📣 @DylanDylann 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!
Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑💻 Keep in mind: Code of Conduct | Contributing 📖
Oops. I'm not sure if that matters 😬
@DylanDylann how do you feel about the proposals here?
on my list today
It will take me a bit of time to dive deep into this issue because this seems be complicated
First, it needs to be checked why the previousUpdateID for the AddComment response is sent as a higher value than the OpenReport call’s lastUpdatedID/lastClientUpdateID. Fixing this issue should resolve the problem.
@c3024 Could you help to detail this point?
@DylanDylann
Sure!
https://github.com/user-attachments/assets/637e66ac-85b8-4a36-b6c3-f8fc690d061f
https://github.com/user-attachments/assets/36aa8a34-43f8-4351-9be5-9c3476b29250
AddComment
requests have the correct ids. The lastUpdateID of the first comment matches the previousUpdateID of the next comment, so no GetOnyxMessages call is made because the client understands that the chain of updates is intact and nothing is missing.There is a fix required on the backend for the first AddComment response after the OpenReport response. However, there is also a generic problem in the frontend codebase, as mentioned in the RCA and the second point of my suggested solution. The issue is that the last write response (in this case, AddComment, along with the updates waiting in queuedOnyxUpdates) which triggered the GetMissingOnyxMessages call, is not immediately written after the missing updates are fetched though the SequentialQueue is paused after fetching these missed updates. This happens because there is no code to flush the queuedUpdates unless another response is received from the backend, as outlined in the second point of my solution.
When the next write request gets completed and the persistent requests array is empty, all queuedOnyxMessages get updated. We can see this in the present bug as well: if we send another message, everything updates correctly (the backend also does not send an incorrect previousUpdateID) because the queue is unpaused. After processing this request, when the persisted requests are empty, the queuedOnyxUpdates (which include the previous OpenReport and AddComment responses’ onyxData, successData, etc.) are flushed.
Feel free to ask if there is still something unclear. As you mentioned, this is a fairly complex issue, and I may have missed explaining some parts fully in an effort to keep the comments concise.
@c3024 Thanks for a deep explanation
@DylanDylann My solution here can fix both issues. @nkdengineer your updated solution is the same as mine.
@nkdengineer
For the reason why GetMissingOnyxMessages API is called I think maybe it's expected from BE.
Why do you think it is expected?
Hi @mkzie2, Thanks for your contribution, your solution and @nkdengineer's solution have the same idea. But the @nkdengineer proposed it first in https://github.com/Expensify/App/issues/46393#issuecomment-2255598546
Why do you think it is expected?
I'm not sure about the logic of it, that's my guess.
Thanks everyone for your inputs
I agree with @c3024 about your RCA
previousUpdateID
on the AddComment API is weird (more detail here). But I am not sure about the BE logic. Let's wait for an internal engineer to verify that
We don't have the logic to flush OnyxUpdateQueue when it is unpaused. With this problem, I lean toward @nkdengineer's proposal.
🎀 👀 🎀 C+ reviewed
Triggered auto assignment to @aldo-expensify, see https://stackoverflow.com/c/expensify/questions/7972 for more details.
@c3024 Please give your concern if there is any problem. I really appreciate your investigation with details of RCA
That is fine. It is necessary for PersistedRequests to be empty before flushing the queue. While I explained it in the next AddComment
part, I missed considering it in my solution.
hmm, thanks @c3024 for that detailed explanation: https://github.com/Expensify/App/issues/49931#issuecomment-2422259161
I gave it a first read, I didn't understand 100% because I'm not very familiar with all that, but I think I understood that there is a backend bug related to the onyx updates ids being passed around. If this is true, I'd prefer to prioritize and fix that first, and then see again where we are at with this bug.
I'll give it sometime later today to try to reproduce your detailed steps and see if I can understand/see the bug.
@RachCHopkins, @aldo-expensify, @DylanDylann Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
I'll give it sometime later today to try to reproduce your detailed steps and see if I can understand/see the bug.
I fail that day and then I forgot to come back. I'll try to take a better look at this asap
not overdue, waiting for @aldo-expensify
@RachCHopkins @aldo-expensify @DylanDylann this issue is now 4 weeks old, please consider:
Thanks!
Sorry, still haven't had time for this yet.
Waiting on Aldo. Not overdue.
- For all write requests, the backend sends a previousUpdateID and a lastUpdateID. In the above video, the lastUpdateID sent back in the OpenReport response is 2468230213. This means the backend considers this ID to be the last time the client device was updated.
- Then, when we made an AddComment request, the server’s response returned a previousUpdateID of 2468230219. This means the backend is informing the client that the freshest update it sent previously to all devices is 2468230219. This is larger than the lastUpdateID (2468230213) saved to the device. The backend is suggesting to the client that there are updates between the OpenReport response (2468230213) and the AddComment response (2468230219). However, this is incorrect because it is impossible for there to be any updates in between, as this is the only client making changes to this account. The backend is sending an incorrect previousUpdateID of 2468230219 when it should be sending the lastUpdateID from the OpenReport response, which is 2468230213.
I wasn't been able to reproduce this with a slow connection:
but then with these steps I was able to reproduce in dev:
lastUpdateID
and the AddComment's previousUpdateID
I'll investigate what are the updates in between and why these are not matching now.
Update: In the end, I was able to reproduce without having to go offline, I think the happens sometimes and sometimes not.
Hmm... I think the lastUpdateID
and previousUpdateID
mismatch come from the OpenReport
command forwarding a CreateReportAction
command to create the whisper "Hi there! This is your first time chatting with..."
. Sometimes the CreateReportAction
can finish fast enough so its updates are considered in the previousUpdateID
returned by OpenReport
and sometimes not.
Changing this behaviour in the backend sounds tricky to do. We forward the command to avoid making OpenReport
faster so there is less DB conflicts.
Assigning @nkdengineer since the proposal to fix the frontend makes sense to me too.
📣 @nkdengineer 🎉 An offer has been automatically sent to your Upwork account for the Contributor role 🎉 Thanks for contributing to the Expensify app!
Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review 🧑💻 Keep in mind: Code of Conduct | Contributing 📖
⚠️ Looks like this issue was linked to a Deploy Blocker here
If you are the assigned CME please investigate whether the linked PR caused a regression and leave a comment with the results.
If a regression has occurred and you are the assigned CM follow the instructions here.
If this regression could have been avoided please consider also proposing a recommendation to the PR checklist so that we can avoid it in the future.
@aldo-expensify, Could you help to check this comment? I believe that we need to get the PR merged as soon as possible because there are many similar bugs in other places that also be fixed by our PR
If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!
Version Number: 9.0.41-2 Reproducible in staging?: Y Reproducible in production?: Y If this was caught during regression testing, add the test name, ID and link from TestRail: N/A Email or phone of affected tester (no customers): biruknew45+1195@gmail.com Issue reported by: Applause - Internal Team
Action Performed:
Expected Result:
The chat with the selected user should remain a 1-on-1 chat. The chat should not change into a group with duplicate users
Actual Result:
After sending the message, the chat turns into a group with duplicate users. It only corrects itself back to a single chat after switching to another chat and then returning to it
Workaround:
Unknown
Platforms:
Which of our officially supported platforms is this issue occurring on?
Screenshots/Videos
Add any screenshot/video evidence
https://github.com/user-attachments/assets/7ed99952-55d7-44f6-a72f-26f59e044154
View all open jobs on GitHub
Upwork Automation - Do Not Edit
Issue Owner
Current Issue Owner: @nkdengineer