ether / etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.
http://docs.etherpad.org/
Apache License 2.0
16.5k stars 2.84k forks source link

Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) #2107

Closed SkoricIT closed 4 years ago

SkoricIT commented 10 years ago

Hey guys. We are using stable and have the problem that some pads randomly stop working and throw an uncaught error in the console.

Uncaught Error: Failed assertion: Invalid changeset (checkRep failed) 

Example:

https://etherpad.tugraz.at/p/l3tsbet

When this happens, the "loading" overlay blocks any action. It's unlikely to be a copy&paste issue because it sometimes happens to entirely handwritten pads.

An interesting thing is, that the timeslider (opened by appending /timeslider to the url) always works without problems.

https://etherpad.tugraz.at/p/l3tsbet/timeslider

Right now we are manually fixing the pads by exporting+importing with HTML (losing all changesets). Any idea whats wrong?

akosiaris commented 6 years ago

The error can easily be reproduced by creating a new pad with a single emoji (e.g. panda_face) and restarting etherpad, see also #3340.

I can not reproduce this, doing exactly what described above. See https://etherpad.wikimedia.org/p/ohmy for an example (yes I 've restarted etherpad multiple times already)

RalfJung commented 6 years ago

We just had a pad break with this error as well. Curiously, checkPad,js does not find any problem, and repairPad.js runs to completion without fixing it. Is there any way to determine which revision is at fault?

EDIT: Ah, I found https://gist.github.com/marcelklehr/a78d293571e7f06e3cf9 which pointed me the right way. Any chance this could be included in etherpad itself? It has been infinitely helpful right now, thanks a ton! (However, I had to replace console.log by console.error to even see any revision numbers. I have no nodeJS experience whatsoever, but I couldn't figure out another way to actually see all the logging.)

RalfJung commented 6 years ago

Indeed doing the "replace ???? by ??" helped here as well. :) Seems like the last changeset was someone inserting an emoji (it ended in $????).

However, I do not understand why this is classified as a "minor bug". This bug leads to total loss of a pad (until someone notices the /timeslider thing, which took a week in our case, and even then history is lost).

marcelklehr commented 6 years ago

Unassigned myself, as it's unlikely I'll get to fixing this. FWIW, this bug appears to be due a limitation of the easysync library, which I'm speculating does not to support all of utf-8. (UTF-8 may encode one character as multiple bytes, which each add to the length of a string in javascript, even though it's just one character.)

marcelklehr commented 6 years ago

-- nevermind -- :D

RalfJung commented 6 years ago

FWIW, this bug appears to be due a limitation of the easysync library, which I'm speculating does not to support all of utf-8. (UTF-8 may encode one character as multiple bytes, which each add to the length of a string in javascript, even though it's just one character.)

Actually we have umlauts (äöü) in our pads all the time, which are also multi-byte in UTF-8. Based on what has been said above, I think the issue is actually about UTF-16 -- which, when originally designed, was intended to have exactly 2 bytes per character (codepoint, really), but now that we have more than 2^16 codepoints there are some that need 4 bytes, like emojis. And now length() no longer matches the number of codepoints, and everything gets confused.

So maybe a better fix is to outright reject any surrogate pairs (4-byte codepoints)? That would make it impossible to use etherpad with characters from the supplementary plane, but that's likely broken anyways it seems? And it should protect the DB. There seem to be ways to test for surrogate pairs in JS (but I have zero experience in modern JavaScript).

RalfJung commented 5 years ago

Why did this get closed? To my knowledge, Etherpad still chokes on characters outside the BMP. I recently again had to manually repair a pad that got broken this way.

SkoricIT commented 5 years ago

I closed it because I opened the Issue 2014 and was not interested in it anymore.

RalfJung commented 5 years ago

Well, it is still an open problem for others, so I'd appreciate if you could reopen.

RalfJung commented 5 years ago

Thanks! :)

caugner commented 5 years ago

Does anybody have any example for a character (sequence) that breaks a pad reliably? This would facilitate debugging I guess.

muxator commented 5 years ago

The Easysync library describes text (and its legth) in terms of "characters", but it was a minimum viable product from 10 years ago. Nowadays we should probably think in terms of NFC-normalized UTF-8 code points.

caugner commented 5 years ago

Just wondering, might we be able to solve the problem by storing the ueberdb values as binary blobs rather than in a collated text column?

Currently, if we try to put a byte sequence that is not valid utf8mb4 (think: a changeset that contains part of a multibyte character) into a utf8mb4 column, there are only two possible outcomes: either the database refuses the input, or client (or server) need to remove (think: replace with "?") the invalid "characters" or bytes before.

By using a binary blob column, the database would no longer care about the byte sequence being invalid utf8mb4, so we might avoid the character replacement. If easysync is as encoding agnostic as I understand, this could work (as long as two users don't insert multibyte characters AB and CD at the same position concurrently and these end up as individual changeset A, C, B, D - in this order -, rendering the merged result invalid utf8mb4).

PS: I just tested that inserting a 4-byte UTF8 character like 🍰 is not a problem itself (although: I didn't restart yet, which may be explanation), so I assume the bug either requires concurrency (leading to the character being split up in two or more changesets that are invalid on their own) or it requires a client emitting a changeset that removes part of such a character.

chronikum commented 5 years ago

Hi, we are also experiencing this problem on a lot of pads.

JohnMcLear commented 4 years ago

I'm trying everything and just can't replace this with 🍰, I tried restarts, different database backends (that are properly configured)..

Can anyone provide steps to replicate with our more modern code base?

Hitting backspace on 🍰 does replace the item with � which is obviously sucky.

RalfJung commented 4 years ago

For me, replace(value,'????','??') has always worked so far. Hasn't happened for a few months though.

JohnMcLear commented 4 years ago

I included an updated version of Check Pad Deltas that works, if people can give that a try to see if it helps when experiencing this problem I'd appreciate it.

muxator commented 4 years ago

I still think the basic problem is that Etherpad data model thinks in terms of "characters" and not normalized UTF-8 code points.

Unless we rework the core library this will never be really solved. Obviously, any mitigation is useful. Just saying that there are no easy solutions that are guaranteed to be 100% correct in my opinion.

JohnMcLear commented 4 years ago

You'd be surprised just how many editors (and very popular ones with developers) have a similar experience to Etherpad tho. Playing around today I had some crazy experiences.

muxator commented 4 years ago

I included an updated version of Check Pad Deltas that works, if people can give that a try to see if it helps when experiencing this problem I'd appreciate it.

Pulled in in the master branch with #3717 (14ae2ee95094).

gnd commented 4 years ago

Hi, we are having a similar issue with one of our Pads. @JohnMcLear unfortunately the latest version of checkPadDeltas did not help :/

JohnMcLear commented 4 years ago

@gnd do you have a public instance?

Can you hit the padId/export/etherpad url and get the .etherpad file?

Are you running latest develop?

What's your database backend?

So many questions, please provide as much details as possible

gnd commented 4 years ago

@JohnMcLear Yes, its a public instance: https://pad.xpub.nl/p/CareCircle Unfortunately i get a 502 Bad Gateway error trying to get the .etherpad file We are running latest develop (git pull origin) on nodejs 12.16.3-1nodesource1, with the db backend being 10.3.22-MariaDB-0+deb10u1.

Im available today to help you with any sort of debugging you might want to do. I have already tried the last version of checkPadDeltas, however it just hangs for hours after start. This is the only output it gives:

All relative paths will be interpreted relative to the identified Etherpad base dir: /opt/etherpad [2020-05-05 00:04:12.330] [DEBUG] AbsolutePaths - Relative path "settings.json" can be rewritten to "/opt/etherpad/settings.json" [2020-05-05 00:04:12.346] [DEBUG] AbsolutePaths - Relative path "credentials.json" can be rewritten to "/opt/etherpad/credentials.json" settings loaded from: /opt/etherpad/settings.json No credentials file found in /opt/etherpad/credentials.json. Ignoring. [2020-05-05 00:04:12.369] [INFO] console - Using skin "no-skin" in dir: /opt/etherpad/src/static/skins/no-skin [2020-05-05 00:04:12.371] [INFO] console - Session key loaded from: /opt/etherpad/SESSIONKEY.txt [2020-05-05 00:04:12.541] [ERROR] console - table is not configured with charset utf8 -- This may lead to crashes when certain characters are pasted in pads [2020-05-05 00:04:12.543] [INFO] console - RowDataPacket { character_set_name: 'utf8mb4' } utf8

JohnMcLear commented 4 years ago

Dude, the error is in your log!

[2020-05-05 00:04:12.541] [ERROR] console - table is not configured with charset utf8 -- This may lead to crashes when certain characters are pasted in pads
[2020-05-05 00:04:12.543] [INFO] console - RowDataPacket { character_set_name: 'utf8mb4' } utf8

See: https://github.com/ether/etherpad-lite/issues/3959

gnd commented 4 years ago

@JohnMcLear our db has

+----------------------------+------------------------+ | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | +----------------------------+------------------------+ | utf8 | utf8_general_ci | +----------------------------+------------------------+

While the store table has

+--------------------+ | character_set_name | +--------------------+ | utf8mb4 | +--------------------+

So should i convert using ALTER DATABASEetherpad_lite_dbCHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

?

gnd commented 4 years ago

@JohnMcLear

The misconfiguration was twofold, the database was using utf8 and utf8_general_ci, but also in the settings.json the charset for the database was set as "utf8". Having fixed that all to utf8mb4 still didnt help, and the pad in question doesnt load, and the checkPadDeltas still hangs:

All relative paths will be interpreted relative to the identified Etherpad base dir: /opt/etherpad [2020-05-05 13:17:43.443] [DEBUG] AbsolutePaths - Relative path "settings.json" can be rewritten to "/opt/etherpad/settings.json" [2020-05-05 13:17:43.444] [DEBUG] AbsolutePaths - Relative path "credentials.json" can be rewritten to "/opt/etherpad/credentials.json" settings loaded from: /opt/etherpad/settings.json No credentials file found in /opt/etherpad/credentials.json. Ignoring. [2020-05-05 13:17:43.463] [INFO] console - Using skin "no-skin" in dir: /opt/etherpad/src/static/skins/no-skin [2020-05-05 13:17:43.464] [INFO] console - Session key loaded from: /opt/etherpad/SESSIONKEY.txt

JohnMcLear commented 4 years ago

@gnd It's a GiGo problem. Once you have garbage in, it can't be changed. Now all you know is the problem wont appear in the future!

caugner commented 4 years ago

@gnd It's a GiGo problem. Once you have garbage in, it can't be changed. Now all you know is the problem wont appear in the future!

Wouldn't repairPad.js be able fix these broken pads?

JohnMcLear commented 4 years ago

Oh hi @caugner - sadly no, repairPad.js generally sucks and doesn't really work. https://github.com/ether/etherpad-lite/blob/develop/bin/repairPad.js#L48

The best thing I can suggest is to pull the atext/text out of the pad and bring it into a new pad.

@gnd I can write you a script to test to try and get the text if you want?

bin/extractPadData.js with a change to output to stdout might be sufficient here.. 2mins I will create an extractPadText.js

gnd commented 4 years ago

@JohnMcLear that would be quite helpful indeed )

JohnMcLear commented 4 years ago

Extracting

Use node bin/extractPadData.js $padid Then cat $padid.db | grep \"text\" | grep revNum | tail -1

The text is the val.atext.text item, you could json parse this at cli.. I will do that next if you need it.. For now do these commands making sure you replace $padid with your PadID

Parsing

sudo apt-get install jq to install jq then cat $padid.db | grep \"text\" | grep revNum | tail -1 | jq .val.atext.text to see just the text.

To write the Pad text to a text file cat $padid.db | grep \"text\" | grep revNum | tail -1 | jq .val.atext.text > $padid.txt

Now you have the pad text you can just put that in a text file and import or or you setText API or whatever...

Lemme know if extraction fails and I will consider another approach.

gnd commented 4 years ago

The extraction is running, however it is quite slow. In the file CareCicle.db I see the latest line at revs:80, while the script already runs for 20m. The pad in questions has over 12k revisions..

JohnMcLear commented 4 years ago

Oh man, that sucks.. I guess it can't build the pad object after 80 revisions.. It should only take 30 seconds or so for the script to run.

JohnMcLear commented 4 years ago

the last suggestion would be a big one, to dump the entire db and send it to me and then I can write a script to parse out what you need. Alternatively I can try to write a script here but there might be some back & forth to get it working that way.

gnd commented 4 years ago

Hi @JohnMcLear, the script has finally finished. I have no idea why it took so long (almost 40 hrs). Anyway, when looking into it, it seems to me, the whole exercise can be done by selecting the highest revision which is divisible by 100 from the store table and extracting the text from it ? In the future ill do this by hand :) Thanks a lot for your help

JohnMcLear commented 4 years ago

Exactly this, but I often get told off by our users when I make the assumption they can perform database queries so I try to avoid it. I think I know why it took so long btw, are you using MySQL @ Etherpad 1.8.3 ?

gnd commented 4 years ago

I'm using the latest master from git (not sure which version that is)

JohnMcLear commented 4 years ago

Assuming MySQL it's a known bug that we're due to have the patch land today.

gnd commented 4 years ago

yes sorry, its latest MariaDB - 10.3.22-MariaDB

gnd commented 4 years ago

@JohnMcLear im sorry to spam this ticket, but do you have an issue open for the MySQL patch you mentioned ? I want to see if our performance troubles with etherpad might be resolved by it.. thanks

JohnMcLear commented 4 years ago

No but just do npm install ueberdb@0.4.9 to fix

JohnMcLear commented 4 years ago

Btw the new logic for storing additional atext is in so this should be closed but if people experience an issue please do create a new issue and refer to this one. I want to deal with each individual cause of problem case-by-case with the main goal to create automated logic to restore a pad upon detected corruption in real time. That's the dream as corruption is inevitable.

pedro-nonfree commented 4 years ago

This is a message for people getting to this recently (when upgrading from older versions of etherpad).

Today I upgraded an etherpad service from 1.6.3 to 1.8.6 (what a change!!!!! congratulations to all developers)

I had problems with one pad, the checkers (checkPad, checkAllPads, etc.) failed to detect it (or I don't know how to run node fine, anyway).

I verified the charset is utf8mb4 in my settings.json (saw last version in settings.json.template).

  "dbType" : "mysql",
  "dbSettings" : {
    "user":     "etherpaduser",
    "host":     "localhost",
    "port":     3306,
    "password": "PASSWORD",
    "database": "etherpad_lite_db",
    "charset":  "utf8mb4"
  },

for case https://pad.example.com/p/my-broken-pad I did:

mysql
update `store` set `value` = replace(`value`,'????','??') where `key` like "pad:my-broken-pad"

and it worked again :tada: :unicorn: :sparkles:

this solution was above (I put a +1 on previous messages with the solution to help find it), but I wanted to have it more clear

JohnMcLear commented 4 years ago

I guess one thing we could do here is check for ???? in pad contents and provide a warning that includes a suggested solution. @pedro-nonfree please could you submit a patch to checkPad.js or something then I'd happily merge that :)

InterFelix commented 3 years ago

This error occured with one single pad on an instance that was never upgraded and has been pinned to version 1.8.6 since initial deployment today. I fixed the issue, however I don't know what actually helped. First I tried the SQL query, that seemed not to help. Then I set the charset as an env variable on my kubernetes deployment, which redeployed the pod. I can't say if it was the charset or the SQL query in combination with the redeploy, but it's fixed now.