Open mortbauer opened 5 months ago
@mortbauer I haven't been able to reproduce this error so far. Where exactly does it occur - directly on migration or in some view after the migration has been executed? What's the error's stack trace?
How exactly do you set the string before running the migration? (I tried setting it manually in mariadb, through the rails console and even by rolling back, sending a message containing emojis and re-migrating, but the error never occurred. I also tried various levels of escape characters, but to no avail.)
Happens directly in the migrations; well just happened for 3 foodcoops out of 83 so it really because of special characters like in the example posted above. To set this easiest is to directly alter it in the database, you could also send a message with that text ;)
To set this easiest is to directly alter it in the database, you could also send a message with that text
Like I said, I did both (set it directly in the db and send a message with that text), but running the migration afterwards did not throw an error for me. I tested on f4a4528fdc86908243849cb3bf92d9c95f1106a1 with the local docker dev setup.
No idea. To be honest I do not wanna waste more time on this, I had this error in production and I fixed it so I wanted to share. Will not go deeper here and try to recreate a problem I don't have anymore (since I never will have those migrations again)
Hm I cannot reproduce the error either. I can also understand that you @mortbauer don't want to spend more time on this. I'm not sure how to proceed, there might be a edge case (maybe database encoding?) that leads to this error. I'm glad you shared your fix, as far as I understand it renders chars that are not in ascii as its ordinal number, right? I wouldn't like that to happen in this migration very much, as it actually can handle unicode (most of the time... :D)
Would it be okay to close this issue then until someone else encounters this error again or you find time & motivation to provide more information how we might reproduce the error?
Hm, understood, I simply close this issuel, we still have it documented then in case sombody else stumbles over it.
Ok so this was actually not only affecting migrations but also creating new messages. And they reason is that the characterset was not utf8mb but utf8mb3. I changed this now all over the place, set it for the rails database connection encoding but I also had to manually run ALTER TABLE action_text_rich_texts CONVERT TO CHARACTER SET 'utf8mb4';
for every table in every db, see https://stackoverflow.com/a/1294156 for more background.
I think we should mention this at least in known issues or something, because it is nasty!
@mortbauer Nice find :+1:
I investigated a little too - here's what I found out:
alter table messages convert to character set utf8mb3 collate utf8mb3_unicode_ci;
in the mariadb container ...impossible without utf8mb4 🙂
through MessagesController#create
-> I immediately get the following error:
Mysql2::Error: Incorrect string value: '\xF0\x9F\x99\x82\x0D\x0A...' for column development.messages.body at row 1
So the data couldn't even be stored in the db (If I then upgrade to the latest master, all migrations run without errors even if the database's default collation is utf8mb3.)
So the only way this can occur is, if messages.body
contains unicode characters encoded as HTML entities (which is probably something the client decides - older clients surely have done that; newer ones still might do so.) The current version of simple_format
then converts them to real emojis as it assumes all systems can handle those by now leading to the error in utf8mb3.
The foodsoft prior to this change wasn't even able to display the representation of these HTML entities though - they were always just displayed as plain text.
I think we should mention this at least in known issues or something, because it is nasty!
I agree with that - it's a nasty pitfall.
We may also consider 'fixing' it by adding :collation => :utf8mb4_general_ci
to the CreateActionTextTables
migration for action_text_rich_texts
...? Or is that to database specific a thing to have it in a rails migration?
wow thanks, great investigation! I think with the rails7 upgrade https://github.com/foodcoops/foodsoft/commit/3d81dd6b57d7bd10b626e00ac36775122bb2d363 the schema.rb was changed to contain charset and collation. I think it's not to database specific if we rely on the db to handle unicode. Would it make sense to have a Migration for all tables to the correct collation? Maybe this could be combined with your cleanup migration @lentschi ?
I'll put a note to the release notes, definitely a nasty pitfall!
The release 4.8.0 includes pull request Introduce Message Formatting with Actiontext Trix Editor #1013
the migration 20230215085312_migrate_message_body_to_action_text.rb will fail if the message includes unicode characters like emojis.
for example the following text: