WordPress / gutenberg

The Block Editor project for WordPress and beyond. Plugin is available from the official repository.
https://wordpress.org/gutenberg/
Other
10.49k stars 4.18k forks source link

HTML characters should not convert to symbol when editing as HTML #22337

Open mapk opened 4 years ago

mapk commented 4 years ago

Describe the bug Carrying over an issue from here: https://wordpress.org/support/topic/editor-entities-in-text-mode-copy-paste-in-visual-mode/. This issue focused on the Classic Editor, but it seems to be the case with Gutenberg as well.

When editing as HTML, and typing the HTML character for a symbol (ie. —) it gets converted to the symbol. However, when typing &, that does not get converted. We should not convert any of them while editing as HTML.

To reproduce Steps to reproduce the behavior:

  1. Create a Paragraph block. Type some text.
  2. Select the "Edit as HTML" option from the ellipses icon in the toolbar.
  3. Add "&" to the text. Notice that it does not convert.
  4. Now type "—" and notice that this automatically converts. (it should not)

Expected behavior While editing as HTML, the characters should not convert.

Screenshots

html

Editor version (please complete the following information):

Possibly related to: https://github.com/WordPress/gutenberg/issues/13860

joyously commented 4 years ago

While editing as HTML, the characters should not convert.

The entities should not ever be converted to characters. The database should contain entities. The browser will show the entities correctly when showing as HTML.

azaozz commented 4 years ago

This is not a simple fix. All HTML entities are also UTF-8 characters (see https://dev.w3.org/html5/html-author/charref) but most are usable only in a web browser. However the post content (or the editor "output") may be used in other places, like RSS feeds, emails, etc. The "htmlspecialchars" are the only entities required for XML/HTML and are generally understood everywhere.

Storing other entities in the DB would probably cause some backwards compatibility issues and affect several other WP components: Formatting, Charset, perhaps Database, and possibly others.

pipfrosch commented 4 years ago

Hi, I would like to express the issue from an accessibility point of view. I have epilepsy and have hit my head a lot. As a result, the pathways from my brain to my finger do weird things when typing, I frequently type similar but different words to what is in my brain, I think what is happening is the incorrect muscle memory gets triggered and sent to the fingers but I'm not sure.

What does that have to do with entities? Well, I have trouble visually distinguishing left/right single/double quotes, em/en dash, etc. so I type the entity (I use the numbered entities as I do a lot in XML where HTML entities aren't defined) because when proofreading, it is easier for me to distinguish ‘ from ’ than it is for me to visually distinguish ‘ from ’

But in WordPress they get converted so I have trouble when proofreading determining if the wrong combination came out of my fingers.

lathanh commented 5 months ago

This issue is also preventing me from being able to use non-BMP unicode (including emoji) at all.

If I try to save a draft with a such a character, I get the error "Updating failed. Could not update post in the database." I believe this is because my MySQL database[1] uses utf8mb3, which can only store BMP characters (which excludes many characters, such as “🛈”, and most emoji).

So, I tried to enter the HTML entity instead (like demonstrated by OP), but the editor automatically replaces it with the unicode character, thwarting my attempt to use entities as a workaround (and I haven't found any other workaround).

[1] I'm on a hosted solution (EasyWP) where I'm not sure that I can change the database/table/column character sets. Even if I could, I still think this behavior in WP should be addressed.