facebook / lexical

Lexical is an extensible text editor framework that provides excellent reliability, accessibility and performance.
https://lexical.dev
MIT License
19.67k stars 1.67k forks source link

Bug: undo causes unexpected paragraph deletion in collaborative editor #6614

Open mshafer opened 2 months ago

mshafer commented 2 months ago

When using CollaborationPlugin and two users are editing the same paragraph, an undo from one user can cause Lexical and the Yjs doc to get out of sync, leading to the whole paragraph getting cleared on page refresh.

Lexical version: 0.17.1

In this screen recording, observe the behaviour after User 1 (left) clicks the undo button: it unexpectedly deletes a character from User 2's word, then after a refresh of the page the whole paragraph is cleared:

https://github.com/user-attachments/assets/3f631b41-99a2-4cd3-8402-2a9334e06877

When we replicate the issue in our own application we have detection code that shows Lexical editor state and the Yjs document are getting out of sync immediately after the user presses undo. Lexical thinks the second paragraph has the text node with User 2's text, while the Yjs doc thinks the second paragraph node is empty.

Steps To Reproduce

  1. Open two windows to https://playground.lexical.dev/?isCollab=true
  2. On the left, put the cursor at the start of an empty second paragraph. (The issue still presents in the first paragraph, but this demonstrates it's not unique to the start of the document.)
  3. On the left, type: This is a, <backspace>, a test.
  4. On the right, type: Word
  5. On the left, click the "Undo" button until all of This is a test. is removed. Sometimes this happens after one undo, sometimes multiple undo's.
  6. Refresh the page

The current behavior

The main unexpected behaviour is that after the undo operation, a refresh of the page causes the whole paragraph to be cleared, including the text from User 2.

In the specific case captured in the screen recording, it was also unexpected that User 1's undo operation deleted the W of Word from User 2.

The expected behavior

User 1's undo operation should undo only the text they typed, and a refresh of the page after the undo should not lead to the paragraph content getting cleared.

Impact of fix

This bug can cause loss/corruption of data for all users of the Lexical collaboration plugin.

Note that although the data corruption behaviour is usually the deleted paragraph as shown in the screen recording, in some cases we were able to reproduce the type of duplicated and garbled text seen in https://github.com/facebook/lexical/pull/6523 and https://github.com/facebook/lexical/pull/6374. We can't reliably reproduce this now, but it required both User 1 and User 2 performing edits on the affected paragraph right after pressing undo but before refreshing the page. We know at that point Lexical and Yjs are out of sync, so some combination of edits leads to further corruption.

mshafer commented 1 month ago

Hi @ivailop7 @etrepum @StyleT @fantactuka, including you here as I saw you have recently worked on and/or reviewed PRs related to the collaboration plugin, but apologies if this is not relevant to you.

Wanted to check if you had any pointers on how we could look further into this issue? We're keen to put some effort into debugging and fixing but it stretches our understanding of the Yjs document data structures, syncLexicalUpdateToYjs, the Yjs UndoManager, etc. Any guidance to help accelerate us would be much appreciated!

etrepum commented 1 month ago

The first thing I would check is to see if this happens if shouldBootstrap={false}. It requires a bit more orchestration to bootstrap the document server-side, but there are just inherent race conditions if two clients both think they can bootstrap the document when it is empty. I do have a pretty good understanding of CRDTs and distributed systems in general but the lexical/yjs implementation details are not something I'm deeply familiar with because I haven't used any of it in an application.

mshafer commented 1 month ago

Thanks @etrepum, we have shouldBootstrap={false} in our own application and we can reproduce the same behaviour. I'll try get a repro using the Lexical playground, too.

mike-atticus commented 1 month ago

Hi @etrepum just following up with a repro in split-screen mode of the playground where based on my understanding of the code, shouldBootstrap is true only for left-hand-side client. Slightly different from getting the server to bootstrap, but I think this should be sufficient to avoid any race conditions caused by bootstrapping?

In this split-screen approach we just reload one of the windows instead of refreshing the whole page:

https://github.com/user-attachments/assets/9e1b9f5c-9296-4c71-9dd2-c20f3764c825

ivailop7 commented 1 month ago

We've also started seeing document corruption after the user does an 'undo' event and then the editor crashes following any input, I wonder if this is the same issue in question.

mike-atticus commented 1 month ago

That's interesting @ivailop7 – never seen the editor crashing after the 'undo' event, but we have seen that further edits can lead to very garbled and duplicated text.

In case it's helpful for others to repro, I've added a test (PR) that runs through the steps in my screen recordings above.

ivailop7 commented 1 month ago

Fair enough. Thanks for the PR!

mike-atticus commented 1 month ago

Hi @ivailop7 @etrepum, checking in on this one – just wondering if you know if/when Lexical maintainers might be able to look more deeply into this bug? It'd be helpful to get any indication as we're aiming to roll out our collaborative editor for production usage, and this bug is introducing a doubt on readiness. Thanks in advance.

(We're also planning to dive fully into debugging, we just know for us it will require a bit of extra time to get up to speed on the Yjs data structures.)

acemtp commented 3 weeks ago

In rare case when we do collaboration editing, we have some very strange behavior like the one on the screenshot where a line of text is duplicated tons of time.

image

Could it be related to this undo bug?

We cannot reproduce it but we put tons of log on server and everything seems good on the YJS side.

It seems that the undo break the lexical tree and in some cases, if I write a new letter, it creates a client side error 94.

Because splice function cannot find the node in the tree and the fallback make things copy/paste some paragraph instead adding the letter I typed.

image
mike-atticus commented 2 weeks ago

@acemtp that's interesting to see! Does this happen with Lexical 0.17.1 and above?

acemtp commented 2 weeks ago

We use 0.18.0.

ebengtso commented 5 days ago

Same bug (multiplication of texts and text loss in neighbouring paragraphs) as Vianney Lecroart happens to me on version 0.19.0. I can reproduce it quite often.

ivailop7 commented 4 days ago

https://github.com/facebook/lexical/pull/6670 got merged yesterday. You can install the nightly version to check if things get better.