backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

[UX] Prevent "EntityStorageException" exception when trying to save emojis without UTF8-mb4 enabled in the database, and show a friendly message instead #4927

Open klonos opened 3 years ago

klonos commented 3 years ago

I got his error when saving a page:

EntityStorageException: SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value:
'\xF0\x9F\x91\x8B</...' for column `backdrop`.`field_data_body`.`body_value` at row 1
in NodeStorageController->save() (line 510 of /app/docroot/core/modules/node/node.entity.inc).
SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value:
'\xF0\x9F\x91\x8B</...' for column `backdrop`.`field_data_body`.`body_value` at row 1

Notes: I was editing a node-block, so the workflow was:

image

indigoxela commented 3 years ago

'\xF0\x9F\x91\x8B</...' EMOJIS!

Your database isn't utf8-mb4. If you can't get that working, you'll need validation to prevent users from inserting emojis.

\xF0\x9F\x91\x8B is "waving hand" - :wave:

klonos commented 3 years ago

I'm able to reproduce this on 1.18.1 vanilla installation on:

Not able to reproduce it on our Tugboat demo sandboxes (Apache2.4.38 / php 7.2.34 / 5.5.5-10.5.8-MariaDB-1:10.5.8+maria~focal)

klonos commented 3 years ago

'\xF0\x9F\x91\x8B</...' EMOJIS!.

Right 🤦

indigoxela commented 3 years ago

@klonos I've seen Exceptions like that many times before. Check your status page(s) for "MySQL Database 4-byte UTF-8 support"

klonos commented 3 years ago

Yeah, that's it 😅

klonos commented 3 years ago

...still, we should be handling this gracefully. No?

indigoxela commented 3 years ago

we should be handling this gracefully. No?

When I'm in this situation (no utf8-mb4 in the available db) and I can't upgrade or do anything else about it, I create a small custom module, that replaces the emojis in hook_entity_presave() or the like - depending on available fields.

Not sure, if something like that would be feasible in core.

klonos commented 3 years ago

I think that we should be catching this and either:

indigoxela commented 3 years ago

Silently stripping might not be the most polite way. In my own use-cases I have always been able to explain to the users, why this happens - we have no opportunity to explain that to all possible Backdrop editors. Or do we?

On the other hand - let the form validation fail might be frustrating for users. Maybe they're not familiar with the name "emoji" and they don't know what they have to do.

But we probably can find a good compromise re UX.

From a technical point of view: emojis can get added to every textfield or textarea - that are quite a lot different form elements. Are we able to catch the problem in a generic way for all of them?

Another thought: we don't have metrics, so we have to guess, how often the problem may actually occur. Potentially this is a contrib candidate?

klonos commented 3 years ago

Potentially this is a contrib candidate?

Throwing a meaningful warning/error instead of a cryptic exception (which only developers can understand) is a core issue. Handling emojis when there is no support for them in the db seems like contrib (if at all possible).

indigoxela commented 3 years ago

Here's the contrib module - probably as generic as can be. It enhances #element_validate for core text form elements ('textfield', 'textarea'). Or did I miss anything?

Throwing a meaningful warning/error instead of a cryptic exception ... is a core issue.

I've no clue yet, where to "attack". :wink: Any idea?

klonos commented 3 years ago

I've no clue yet, where to "attack". 😉 Any idea?

🤔 ...I'm thinking perhaps try _form_validate() or backdrop_validate_form().

indigoxela commented 3 years ago

...I'm thinking perhaps try _form_validate() or backdrop_validate_form().

Hm. The actual problem arises in insert/merge/update queries to the database, so form validation functions won't actually help in this case, I suppose. We don't have any validation for 4-byte characters in core, and I think we shouldn't, because the "80% case" is that utf8mb4 is enabled and works correctly - at least, that's what I suppose.

Changing the behavior, how Exceptions get handled, might need a way too big change in the db abstraction layer.

Per default Exceptions are thrown (see public function query in core/includes/database/database.inc). All sorts of Exceptions, not only the "1366 Incorrect string value" one.

klonos commented 3 years ago

...the "80% case" is that utf8mb4 is enabled and works correctly - at least, that's what I suppose.

Another issue that would benefit from usage metric then.

klonos commented 3 years ago

OK, I've done some research, and as expected, this problem also applies to Drupal core. Here's a list of relevant issues in d.org, along with a summary for each one, so that you don't waste time reading through them:

indigoxela commented 3 years ago

Many thanks for doing a research (of course, Drupal is also affected, my custom module mentioned earlier in this thread is for Drupal). But one important question remains: is this still a considerable problem? The linked Drupal issues have been opened many years ago. How is the situation today?

I'd assume, that presenting a more user-friendly message instead of the Exception requires us to add an #element_validate to affected form field types to core. Possibly based on the state of "database_utf8mb4_active". Then we show a message ... and still can not save the value to the database - how can we handle that in a clean way?

A helpful message (instead of the Exception) might not be a real solution, BTW. On shared webhosting, users have no influence on the database version or setup. Presenting a link to (technical) documentation might be better than just the Exception, but if they're not able (permitted) to do any of the recommended things... That's not so helpful either.

The new contrib module has no release yet by intention. It's also a proof of concept for this discussion.

To my understanding the essential question here will be: how much of the solution should happen in core?

klonos commented 3 years ago

On shared webhosting, users have no influence on the database version or setup.

Correct. As also mentioned on various of those issues in the Drupal queue, this also happens when people try to enter emojis via submitted webforms, the site contact form, or even when using the site search (although who would do that last one - right?). Anyway, this means that the recipient of the warning will be the site visitor, rather content editors or the site admin. So more meaningful validation errors will be required, like this:

The "Subject" field contains the following characters, which are not supported/allowed on this form:

  • 👋
  • 🙂

Please remove these characters, or replace them with something else, and then try again.

indigoxela commented 3 years ago

this means that the recipient of the warning will be the site visitor, rather content editors or the site admin

That's right. So displaying the problematic characters would be the most helpful thing. Hm... and/or we could log the problem, with more technical infos. But then - what if the admin is aware of the problem anyway, but can't do anything about it yet?

indigoxela commented 3 years ago

Something more visual - two screenshots from the contrib module:

If the setting is "Prevent form submission": form-error-4-byte

The setting page: utf8mb4-module-settings

So: is this, or are parts of this, qualified to go into core, or should this stay a contrib solution?

klonos commented 3 years ago

I think that preventing the form submission (which means no exception thrown) + showing a friendly message for the user to remove any "unsupported" characters belongs in core.

The settings to either prevent submission, or automatically clean up and submit + the allow list should stay in contrib.

indigoxela commented 3 years ago

FTR: @docwilmot raised a concern in the related Zulip chat stream, that the module's approach might be overkill. Personally, I don't think so, but it's for sure something to consider.

docwilmot commented 3 years ago

No, that wasnt what I said. I said "we could validate all text input; but that may be overkill since some text entry doesnt hit the database". As noted there, again, was just ruminating on our total options for handling that. Your module doesn't validate all input: there is a whitelist for forms not saving to the DB.

docwilmot commented 3 years ago

I personally think we should handle this in core. Its not reasonable to get a nasty error after typing a little smiley. Handling this on validation is also more friendly IMHO. But haven't read deeply into the idea of other options.

indigoxela commented 3 years ago

No, that wasnt what I said.

Sorry for my misconception. :wink:

Its not reasonable to get a nasty error after typing a little smiley.

So, lets get a subset of this into core. Validation is only done, if state "database_utf8mb4_active" isn't set. It prevents form submission - with a helpful message. With no settings, the allow-list hardcoded to (core) forms, of which we know, they don't save anything to the database. Anything beyond that is a task for contrib.

Did I summarize it correctly?

klonos commented 3 years ago

Did I summarize it correctly?

I think mostly yes 👍 ...the only thing I'm unsure of is this:

the allow-list hardcoded to (core) forms

Ideally, we'd add the validation function to all Form API elements that may be affected; that'd be textfields and text areas. Any others?

indigoxela commented 3 years ago

Ideally, we'd add the validation function to all Form API elements that may be affected; that'd be textfields and text areas. Any others?

Textfield and texarea are the only affected field types. The search input type isn't affected, at least in core. But... some of the textfield items are not affected - for instance views_exposed_form textfields don't save anything to the database. That's why these forms can safely get skipped.

Why is the allowlist based on form_id, not on element id?

Mostly for convenience. Providing a detailed allowlist for each and every single textfield item just doesn't seem feasible.

@klonos does this explain the used approach a bit better?