ajhsu / blog

The external storage of my brain.
3 stars 0 forks source link

Unescaping HTML entities #78

Open ajhsu opened 6 years ago

ajhsu commented 6 years ago

Unescaping HTML entities

Understanding HTML entity

The definition of character entity from Wikipedia:

In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference (&#nnnn; or &#xhhhh;) and a character entity reference (&name;).

Lists of HTML entities

Related tools

Potential unescaping solutions

Unescaping numeric HTML entities

It's relatively easy to unescape those numeric HTML entities, you only need to pass character code into String.fromCharCode() method to unescape into its raw character.

Comprehensively unescaping HTML entities

In order to comprehensively unescaping HTML entities, including numeric and named characters; We need to leverage other solutions beyond the String.fromCharCode method.

.innerHTML

The he package