Open klonos opened 3 years ago
'\xF0\x9F\x91\x8B</...'
EMOJIS!
Your database isn't utf8-mb4. If you can't get that working, you'll need validation to prevent users from inserting emojis.
\xF0\x9F\x91\x8B is "waving hand" - :wave:
I'm able to reproduce this on 1.18.1 vanilla installation on:
Not able to reproduce it on our Tugboat demo sandboxes (Apache2.4.38 / php 7.2.34 / 5.5.5-10.5.8-MariaDB-1:10.5.8+maria~focal)
'\xF0\x9F\x91\x8B</...'
EMOJIS!.
Right 🤦
@klonos I've seen Exceptions like that many times before. Check your status page(s) for "MySQL Database 4-byte UTF-8 support"
Yeah, that's it 😅
...still, we should be handling this gracefully. No?
we should be handling this gracefully. No?
When I'm in this situation (no utf8-mb4 in the available db) and I can't upgrade or do anything else about it, I create a small custom module, that replaces the emojis in hook_entity_presave() or the like - depending on available fields.
Not sure, if something like that would be feasible in core.
I think that we should be catching this and either:
Silently stripping might not be the most polite way. In my own use-cases I have always been able to explain to the users, why this happens - we have no opportunity to explain that to all possible Backdrop editors. Or do we?
On the other hand - let the form validation fail might be frustrating for users. Maybe they're not familiar with the name "emoji" and they don't know what they have to do.
But we probably can find a good compromise re UX.
From a technical point of view: emojis can get added to every textfield or textarea - that are quite a lot different form elements. Are we able to catch the problem in a generic way for all of them?
Another thought: we don't have metrics, so we have to guess, how often the problem may actually occur. Potentially this is a contrib candidate?
Potentially this is a contrib candidate?
Throwing a meaningful warning/error instead of a cryptic exception (which only developers can understand) is a core issue. Handling emojis when there is no support for them in the db seems like contrib (if at all possible).
Here's the contrib module - probably as generic as can be. It enhances #element_validate
for core text form elements ('textfield', 'textarea'). Or did I miss anything?
Throwing a meaningful warning/error instead of a cryptic exception ... is a core issue.
I've no clue yet, where to "attack". :wink: Any idea?
I've no clue yet, where to "attack". 😉 Any idea?
🤔 ...I'm thinking perhaps try _form_validate()
or backdrop_validate_form()
.
...I'm thinking perhaps try _form_validate() or backdrop_validate_form().
Hm. The actual problem arises in insert/merge/update queries to the database, so form validation functions won't actually help in this case, I suppose. We don't have any validation for 4-byte characters in core, and I think we shouldn't, because the "80% case" is that utf8mb4 is enabled and works correctly - at least, that's what I suppose.
Changing the behavior, how Exceptions get handled, might need a way too big change in the db abstraction layer.
Per default Exceptions are thrown (see public function query in core/includes/database/database.inc). All sorts of Exceptions, not only the "1366 Incorrect string value" one.
...the "80% case" is that utf8mb4 is enabled and works correctly - at least, that's what I suppose.
Another issue that would benefit from usage metric then.
OK, I've done some research, and as expected, this problem also applies to Drupal core. Here's a list of relevant issues in d.org, along with a summary for each one, so that you don't waste time reading through them:
https://www.drupal.org/project/drupal/issues/2681073 initially raised under the Webform module queue, then moved to Drupal core 7.x, eventually closed as "fixed" (not really though):
The error is still a problem: Inserting a smiley in a webform (i.e. with iPhone) makes the webform to crash...
This problem is in Drupal core, so it is not a Webform issue.
Steps to reproduce: Create a new piece of content and set its title to an emoticon, such as the one mentioned here: http://apps.timwhitlock.info/unicode/inspect?s=%F0%9F%98%89
Enabling multi-byte UTF-8 support would probably fix it for your site. However when this is not enabled, Drupal should not crash.
I tried to put an example emoticon into this comment, but it caused the error on drupal.org.
Personal comment: 🤣 🤣 🤣 🤣 🤣 ^^
The solution is to update your database server and set your character sets appropriately.
Please follow the steps to enable multibyte UTF-8 support in your database at https://www.drupal.org/node/2754539
Personal comment: a real/ideal solution would be to throw a proper human-readable/understandable error, with a link to any documentation that people should follow. The fact that people still have this problem and keep raising issues in the queue means that we (Drupal/Backdrop) are not doing a very good job in this situation.
https://www.drupal.org/project/webform/issues/2375541 against the Webform 7.x issue queue, closed as "won't fix":
This is not limited to just weforms. Try putting an emoji into a node title or body and see if you get an error. I did.
Links to the same issue reported in other project (linked to from comments in this issue):
For other googlers out there, this module offers a quick fix: https://www.drupal.org/project/strip_utf8mb4
https://www.drupal.org/project/drupal/issues/2002100 initially raised against D8 core, now still active in the D7 queue (penting backport/solution).
pasting texts with 4 byte UTF-8 characters in a field leads to lost text and a broken screen (see screenshot) on installs using mysql. Instead the character could be escaped or skipped or a error message could be given to the user. ... Reason for not saving full unicode is due to a bug in mysql, see #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) Similar ticket Wordpress: http://core.trac.wordpress.org/ticket/13590 Inserting a 4-byte UTF-8 character truncates data
The database layer triggers an exception, that's all that it does. Catching it and processing it belongs in the upper layers. If they don't do that correctly, it's a bug there, not in the database layer.
An attempt was made to add a try
/catch
in NodeFormController->save()
, and there was some back and forth between moving from the field system component to the entity system instead:
This needs to happen in the entity storage controller somewhere, probably
DatabaseStorageControllerNG
or so.This is not entirely field system alone as this can happen to titles as well which is part of the entity system, so moving component too.
...
DatabaseStorageControllerNG::save()
is what is throwing theEntityStorageException
in the first place.(Relevant) Call stack is:
DatabaseStorageControllerNG::save()
Entity::save()
NodeFormController::save()
Entity object doesn't know about context (form submission), but we need to inform a user that the exception occurred, rather than just WSOD. That leaves
NodeFormController
... and so either all forms should handle exceptions when saving entities, or we just don't throw the exception in the first place (which feels wrong). I could be missing an obvious solution here.The try/catch belongs in NodeFormController, nowhere else.
I think this should be major priority since it can lead to a PDOException based on user input.
There are some contrib modules like https://www.drupal.org/project/strip_utf8mb4 and https://www.drupal.org/project/unicode which address this too, but it's really something we should try to fix in core.
Personal comment: I agree with the last comment ^^
Just saying, the same error can be triggered by a site search. Pick your favorite popular D7 site and open "/search/site/%F0%9F%98%89". If the site does not have utf8mb4, you get an error page. Probably the same can happen with watchdog entries or anything else.
So imo every site should be encouraged to enable utf8mb4, by something stronger than the current pleasant green notice in the status report. ... I found exactly one popular site with this problem. It was the first one I tested, so I thought it would be more common.
Many thanks for doing a research (of course, Drupal is also affected, my custom module mentioned earlier in this thread is for Drupal). But one important question remains: is this still a considerable problem? The linked Drupal issues have been opened many years ago. How is the situation today?
I'd assume, that presenting a more user-friendly message instead of the Exception requires us to add an #element_validate
to affected form field types to core. Possibly based on the state of "database_utf8mb4_active". Then we show a message ... and still can not save the value to the database - how can we handle that in a clean way?
A helpful message (instead of the Exception) might not be a real solution, BTW. On shared webhosting, users have no influence on the database version or setup. Presenting a link to (technical) documentation might be better than just the Exception, but if they're not able (permitted) to do any of the recommended things... That's not so helpful either.
The new contrib module has no release yet by intention. It's also a proof of concept for this discussion.
To my understanding the essential question here will be: how much of the solution should happen in core?
On shared webhosting, users have no influence on the database version or setup.
Correct. As also mentioned on various of those issues in the Drupal queue, this also happens when people try to enter emojis via submitted webforms, the site contact form, or even when using the site search (although who would do that last one - right?). Anyway, this means that the recipient of the warning will be the site visitor, rather content editors or the site admin. So more meaningful validation errors will be required, like this:
The "Subject" field contains the following characters, which are not supported/allowed on this form:
- 👋
- 🙂
Please remove these characters, or replace them with something else, and then try again.
this means that the recipient of the warning will be the site visitor, rather content editors or the site admin
That's right. So displaying the problematic characters would be the most helpful thing. Hm... and/or we could log the problem, with more technical infos. But then - what if the admin is aware of the problem anyway, but can't do anything about it yet?
Something more visual - two screenshots from the contrib module:
If the setting is "Prevent form submission":
The setting page:
So: is this, or are parts of this, qualified to go into core, or should this stay a contrib solution?
I think that preventing the form submission (which means no exception thrown) + showing a friendly message for the user to remove any "unsupported" characters belongs in core.
The settings to either prevent submission, or automatically clean up and submit + the allow list should stay in contrib.
FTR: @docwilmot raised a concern in the related Zulip chat stream, that the module's approach might be overkill. Personally, I don't think so, but it's for sure something to consider.
No, that wasnt what I said. I said "we could validate all text input; but that may be overkill since some text entry doesnt hit the database". As noted there, again, was just ruminating on our total options for handling that. Your module doesn't validate all input: there is a whitelist for forms not saving to the DB.
I personally think we should handle this in core. Its not reasonable to get a nasty error after typing a little smiley. Handling this on validation is also more friendly IMHO. But haven't read deeply into the idea of other options.
No, that wasnt what I said.
Sorry for my misconception. :wink:
Its not reasonable to get a nasty error after typing a little smiley.
So, lets get a subset of this into core. Validation is only done, if state "database_utf8mb4_active" isn't set. It prevents form submission - with a helpful message. With no settings, the allow-list hardcoded to (core) forms, of which we know, they don't save anything to the database. Anything beyond that is a task for contrib.
Did I summarize it correctly?
Did I summarize it correctly?
I think mostly yes 👍 ...the only thing I'm unsure of is this:
the allow-list hardcoded to (core) forms
Ideally, we'd add the validation function to all Form API elements that may be affected; that'd be textfields and text areas. Any others?
Ideally, we'd add the validation function to all Form API elements that may be affected; that'd be textfields and text areas. Any others?
Textfield and texarea are the only affected field types. The search input type isn't affected, at least in core. But... some of the textfield items are not affected - for instance views_exposed_form textfields don't save anything to the database. That's why these forms can safely get skipped.
Why is the allowlist based on form_id, not on element id?
Mostly for convenience. Providing a detailed allowlist for each and every single textfield item just doesn't seem feasible.
@klonos does this explain the used approach a bit better?
I got his error when saving a page:
Notes: I was editing a node-block, so the workflow was: