WebAssembly / stringref

Other
37 stars 2 forks source link

Update code unit definition in Overview.md #61

Open dcodeIO opened 1 year ago

dcodeIO commented 1 year ago

As per the overview, code unit is defined as "an indivisible unit of an encoded unicode scalar value". While individual 8-bit, 16-bit or 32-bit code units are indivisible bit combinations, individual units do not represent scalar values. In 16-bit strings for example, a surrogate pair is a divisible combination of two indivisible code units, both surrogates, that are not scalar values. In 8-bit strings, individual code units map to neither scalar values nor code points.

As per Unicode, Glossary:

Code Unit. The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.)

Suggesting to basically copy Unicode's definition and drop the reference to scalar values.