Perl 5.30.2 replaces some invalid UTF-8 byte sequences inconsistent with current best practices.

The Unicode specification says:

An increasing number of implementations are adopting the handling of ill-formed subsequences as specified in the W3C standard for encoding to achieve consistent U+FFFD replacements.

See:

Unicode 14.0 -- Unicode 14.0 Sp\ ecification -- Conformance page 126, section 3.9.
w3.org Encoding -- w3.org encoding

For example, the hex byte sequence:

gets encoded as:

instead of:

Here are a few more examples:

Perl decode: e0 80 80 expected: ef bf bd ef bf bd ef bf bd got: ef bf bd

Perl decode: f0 80 80 80 expected: ef bf bd ef bf bd ef bf bd ef bf bd got: ef bf bd

Perl decode: ed ae 80 ed b0 80 expected: ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd ef bf bd got: ef bf bd ef bf bd

See https://github.com/flenniken/utf8tests for more information.

dankogai / p5-encode

Perl 5.30.2 replaces some invalid UTF-8 byte sequences inconsistent with current best practices. #166