foodcoops / foodsoft

Web-based software to manage a non-profit food coop (product catalog, ordering, accounting, job scheduling).
https://foodcoops.net/
Other
318 stars 146 forks source link

Migration for release 4.8.0 fails for unicode characters #1047

Open mortbauer opened 5 months ago

mortbauer commented 5 months ago

The release 4.8.0 includes pull request Introduce Message Formatting with Actiontext Trix Editor #1013

the migration 20230215085312_migrate_message_body_to_action_text.rb will fail if the message includes unicode characters like emojis.

for example the following text:

"Lieber Oli, \\n\\nvielen Dank! \\n\\nAndi benötigt keinen Account. Wir bestellen gemeinsam, da wir zusammenleben. \\nSollen wir ein Beitrittsformular für ihn ausfüllen? \\n\\nWenn ja, könntest Du mir noch ein leeres zukommen lassen? \\n\\nDanke schön! 😊 \\n\\nLiebe Grüße\\nJosef"
lentschi commented 5 months ago

@mortbauer I haven't been able to reproduce this error so far. Where exactly does it occur - directly on migration or in some view after the migration has been executed? What's the error's stack trace?

How exactly do you set the string before running the migration? (I tried setting it manually in mariadb, through the rails console and even by rolling back, sending a message containing emojis and re-migrating, but the error never occurred. I also tried various levels of escape characters, but to no avail.)

mortbauer commented 5 months ago

Happens directly in the migrations; well just happened for 3 foodcoops out of 83 so it really because of special characters like in the example posted above. To set this easiest is to directly alter it in the database, you could also send a message with that text ;)

lentschi commented 4 months ago

To set this easiest is to directly alter it in the database, you could also send a message with that text

Like I said, I did both (set it directly in the db and send a message with that text), but running the migration afterwards did not throw an error for me. I tested on f4a4528fdc86908243849cb3bf92d9c95f1106a1 with the local docker dev setup.

mortbauer commented 4 months ago

No idea. To be honest I do not wanna waste more time on this, I had this error in production and I fixed it so I wanted to share. Will not go deeper here and try to recreate a problem I don't have anymore (since I never will have those migrations again)

yksflip commented 4 months ago

Hm I cannot reproduce the error either. I can also understand that you @mortbauer don't want to spend more time on this. I'm not sure how to proceed, there might be a edge case (maybe database encoding?) that leads to this error. I'm glad you shared your fix, as far as I understand it renders chars that are not in ascii as its ordinal number, right? I wouldn't like that to happen in this migration very much, as it actually can handle unicode (most of the time... :D)

Would it be okay to close this issue then until someone else encounters this error again or you find time & motivation to provide more information how we might reproduce the error?

mortbauer commented 4 months ago

Hm, understood, I simply close this issuel, we still have it documented then in case sombody else stumbles over it.

mortbauer commented 4 months ago

Ok so this was actually not only affecting migrations but also creating new messages. And they reason is that the characterset was not utf8mb but utf8mb3. I changed this now all over the place, set it for the rails database connection encoding but I also had to manually run ALTER TABLE action_text_rich_texts CONVERT TO CHARACTER SET 'utf8mb4'; for every table in every db, see https://stackoverflow.com/a/1294156 for more background.

mortbauer commented 4 months ago

I think we should mention this at least in known issues or something, because it is nasty!

lentschi commented 4 months ago

@mortbauer Nice find :+1:

I investigated a little too - here's what I found out:

  1. If I bring up a local docker env in the version before this migration has even been executed ...
  2. then run alter table messages convert to character set utf8mb3 collate utf8mb3_unicode_ci; in the mariadb container ...
  3. and finally try to send the text impossible without utf8mb4 🙂 through MessagesController#create

-> I immediately get the following error: Mysql2::Error: Incorrect string value: '\xF0\x9F\x99\x82\x0D\x0A...' for column development.messages.body at row 1

So the data couldn't even be stored in the db (If I then upgrade to the latest master, all migrations run without errors even if the database's default collation is utf8mb3.)

So the only way this can occur is, if messages.body contains unicode characters encoded as HTML entities (which is probably something the client decides - older clients surely have done that; newer ones still might do so.) The current version of simple_format then converts them to real emojis as it assumes all systems can handle those by now leading to the error in utf8mb3. The foodsoft prior to this change wasn't even able to display the representation of these HTML entities though - they were always just displayed as plain text.

I think we should mention this at least in known issues or something, because it is nasty!

I agree with that - it's a nasty pitfall. We may also consider 'fixing' it by adding :collation => :utf8mb4_general_ci to the CreateActionTextTables migration for action_text_rich_texts...? Or is that to database specific a thing to have it in a rails migration?

yksflip commented 4 months ago

wow thanks, great investigation! I think with the rails7 upgrade https://github.com/foodcoops/foodsoft/commit/3d81dd6b57d7bd10b626e00ac36775122bb2d363 the schema.rb was changed to contain charset and collation. I think it's not to database specific if we rely on the db to handle unicode. Would it make sense to have a Migration for all tables to the correct collation? Maybe this could be combined with your cleanup migration @lentschi ?

I'll put a note to the release notes, definitely a nasty pitfall!