Perl 5.30.2 replaces some invalid UTF-8 byte sequences inconsistent with current best practices.
The Unicode specification says:
An increasing number of implementations are adopting the handling of
ill-formed subsequences as specified in the W3C standard for encoding
to achieve consistent U+FFFD replacements.
Perl 5.30.2 replaces some invalid UTF-8 byte sequences inconsistent with current best practices.
The Unicode specification says:
See:
For example, the hex byte sequence:
<e0 80 7f>
gets encoded as:
<ef bf bd 7f>
instead of:
<ef bf bd ef bf bd 7f>
Here are a few more examples:
Perl decode: e0 80 80 expected: ef bf bd ef bf bd ef bf bd got: ef bf bd
Perl decode: f0 80 80 80 expected: ef bf bd ef bf bd ef bf bd ef bf bd got: ef bf bd
Perl decode: ed ae 80 ed b0 80 expected: ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd got: ef bf bd ef bf bd
See https://github.com/flenniken/utf8tests for more information.