-
1. `utf8code` should reject code points above U+10FFFF (RFC 3629).
2. `String#chr` should reject code points between `0xD800` and `0xDFFF` (RFC 3629).
3. `String#ord` should raise error against ill-fo…
-
[DigitalTrustCenter/sectxt](https://github.com/DigitalTrustCenter/sectxt) released 0.9.0 with has quite a few parser improvements, especially on PGP.
The only one I'm not sure about is the strippin…
-
### Proposal Details
I find myself in need of such a method to determine how many bytes in a UTF-8 string when iterating over bytes. Following [RFC 3629](https://datatracker.ietf.org/doc/html/rfc36…
-
In utf8_unicode_inplace_ex(), we have the following code:
```
c = *utf;
/* If first byte begins with binary 0 it is single byte encoding */
if ((c & 0x80) == 0) {
/* single byte unicode (7 bi…
-
Per ISOBMFF (ISO/IEC 14496-12:2020) § 4.2.1, fields of type `utf8string` are
> UTF-8 string as defined in IETF RFC 3629, null-terminated.
Currently this requirement is not enforced in the parsing …
-
The parsing seems to keep starting over and over.
Perhaps the US-ASCII isn't detected/used correctly.
"US-ASCII is upwards-compatible with UTF-8 (an US-ASCII string is also a UTF-8 string, see [RFC 36…
-
## UTF-8
**UTF-8(8-bit Unicode Transformation Format)** 是一种针对Unicode的可变长度字元编码,也是一种前缀码。它可以用来表示Unicode标准中的任何字元,且其编码中的第一个字节仍与ASCII兼容,这使得原来处理ASCII字元的软件无须或只须做少部分修改,即可继续使用。
UTF-8使用一至六个字节为每个字符编码(尽管如此,2…
-
# Deserializing Panic with UTF-8 BOM (Byte Order Mark) Content
I encounter an issue when attempting to deserialize a string encoded in UTF-8 with a Byte Order Mark (BOM). The deserializer throws th…
-
* I use a `read_file` block to print a file whose contents may contain non-ASCII characters encoded in UTF-8.
* I set `Max_characters` to 120.
* It looks like `print_file_contents()` doesn't attempt…
-
According to RFC 3629 encoding/decoding unmatched surrogates should be disallowed:
"The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use …
tmat updated
12 months ago