Open mshafer opened 2 months ago
Hi @ivailop7 @etrepum @StyleT @fantactuka, including you here as I saw you have recently worked on and/or reviewed PRs related to the collaboration plugin, but apologies if this is not relevant to you.
Wanted to check if you had any pointers on how we could look further into this issue? We're keen to put some effort into debugging and fixing but it stretches our understanding of the Yjs document data structures, syncLexicalUpdateToYjs
, the Yjs UndoManager
, etc. Any guidance to help accelerate us would be much appreciated!
The first thing I would check is to see if this happens if shouldBootstrap={false}
. It requires a bit more orchestration to bootstrap the document server-side, but there are just inherent race conditions if two clients both think they can bootstrap the document when it is empty. I do have a pretty good understanding of CRDTs and distributed systems in general but the lexical/yjs implementation details are not something I'm deeply familiar with because I haven't used any of it in an application.
Thanks @etrepum, we have shouldBootstrap={false}
in our own application and we can reproduce the same behaviour. I'll try get a repro using the Lexical playground, too.
Hi @etrepum just following up with a repro in split-screen mode of the playground where based on my understanding of the code, shouldBootstrap
is true
only for left-hand-side client. Slightly different from getting the server to bootstrap, but I think this should be sufficient to avoid any race conditions caused by bootstrapping?
In this split-screen approach we just reload one of the windows instead of refreshing the whole page:
https://github.com/user-attachments/assets/9e1b9f5c-9296-4c71-9dd2-c20f3764c825
We've also started seeing document corruption after the user does an 'undo' event and then the editor crashes following any input, I wonder if this is the same issue in question.
That's interesting @ivailop7 – never seen the editor crashing after the 'undo' event, but we have seen that further edits can lead to very garbled and duplicated text.
In case it's helpful for others to repro, I've added a test (PR) that runs through the steps in my screen recordings above.
Fair enough. Thanks for the PR!
Hi @ivailop7 @etrepum, checking in on this one – just wondering if you know if/when Lexical maintainers might be able to look more deeply into this bug? It'd be helpful to get any indication as we're aiming to roll out our collaborative editor for production usage, and this bug is introducing a doubt on readiness. Thanks in advance.
(We're also planning to dive fully into debugging, we just know for us it will require a bit of extra time to get up to speed on the Yjs data structures.)
In rare case when we do collaboration editing, we have some very strange behavior like the one on the screenshot where a line of text is duplicated tons of time.
Could it be related to this undo bug?
We cannot reproduce it but we put tons of log on server and everything seems good on the YJS side.
It seems that the undo break the lexical tree and in some cases, if I write a new letter, it creates a client side error 94.
Because splice function cannot find the node in the tree and the fallback make things copy/paste some paragraph instead adding the letter I typed.
@acemtp that's interesting to see! Does this happen with Lexical 0.17.1 and above?
We use 0.18.0.
Same bug (multiplication of texts and text loss in neighbouring paragraphs) as Vianney Lecroart happens to me on version 0.19.0. I can reproduce it quite often.
https://github.com/facebook/lexical/pull/6670 got merged yesterday. You can install the nightly version to check if things get better.
Sadly, problem is reproduced again under the scenario with an unique user accessing the editor, and collaboration active.
I will monitor closely the issue and try to collect more information.
@ebengtso that's unfortunate, thanks for confirming. It'd be great to hear more about the conditions / repro steps where you can trigger this problem. Is it only after using undo? And does the text look like that immediately after the undo command, or only after further edits following a problematic undo?
When using
CollaborationPlugin
and two users are editing the same paragraph, an undo from one user can cause Lexical and the Yjs doc to get out of sync, leading to the whole paragraph getting cleared on page refresh.Lexical version: 0.17.1
In this screen recording, observe the behaviour after User 1 (left) clicks the undo button: it unexpectedly deletes a character from User 2's word, then after a refresh of the page the whole paragraph is cleared:
https://github.com/user-attachments/assets/3f631b41-99a2-4cd3-8402-2a9334e06877
When we replicate the issue in our own application we have detection code that shows Lexical editor state and the Yjs document are getting out of sync immediately after the user presses undo. Lexical thinks the second paragraph has the text node with User 2's text, while the Yjs doc thinks the second paragraph node is empty.
Steps To Reproduce
This is a
,<backspace>
,a test.
Word
This is a test.
is removed. Sometimes this happens after one undo, sometimes multiple undo's.The current behavior
The main unexpected behaviour is that after the undo operation, a refresh of the page causes the whole paragraph to be cleared, including the text from User 2.
In the specific case captured in the screen recording, it was also unexpected that User 1's undo operation deleted the
W
ofWord
from User 2.The expected behavior
User 1's undo operation should undo only the text they typed, and a refresh of the page after the undo should not lead to the paragraph content getting cleared.
Impact of fix
This bug can cause loss/corruption of data for all users of the Lexical collaboration plugin.
Note that although the data corruption behaviour is usually the deleted paragraph as shown in the screen recording, in some cases we were able to reproduce the type of duplicated and garbled text seen in https://github.com/facebook/lexical/pull/6523 and https://github.com/facebook/lexical/pull/6374. We can't reliably reproduce this now, but it required both User 1 and User 2 performing edits on the affected paragraph right after pressing undo but before refreshing the page. We know at that point Lexical and Yjs are out of sync, so some combination of edits leads to further corruption.