Raku / old-issue-tracker

Tickets from RT
https://github.com/Raku/old-issue-tracker/issues
2 stars 1 forks source link

utf8-c8 confuses Str.perl #5410

Open p6rt opened 8 years ago

p6rt commented 8 years ago

Migrated from rt.perl.org#128513 (status was 'new')

Searchable as RT128513$

p6rt commented 8 years ago

From zefram@fysh.org

Str.perl fails to represent whatever is being used by the utf8-c8 encoding to represent a non-UTF-8 octet. Thus .perl.EVAL fails to round-trip the Str that arises in the middle of a utf8-c8 decode-then-encode round-tripping of an octet string.

Blob[uint8].new(233, 1).decode("utf8-c8").encode("utf8-c8").perl Blob[uint8].new(233,1) Blob[uint8].new(233, 1).decode("utf8-c8").perl.EVAL.encode("utf8-c8").perl Blob[uint8].new(244,143,191,189,120,69,57,1)

If that mangled octet string is then used as a second input value, the two Str values arising from decoding these two octet strings compare !eq (as they should), but their .perl representations compare eq. This shows that the problem is on the .perl side, rather than the .EVAL side.

-zefram

p6rt commented 8 years ago

From zefram@fysh.org

Additional​: an apparently-null string operation, such as substituting a substring (appearing in the Str) with itself, can mangle the string in the same manner as .perl.EVAL​:

Blob[uint8].new(233, 1).decode("utf8-c8").subst("\x[1]", "\x[1]").encode("utf8-c8").perl Blob[uint8].new(244,143,191,189,120,69,57,1)

-zefram

p6rt commented 7 years ago

From zefram@fysh.org

This problem still occurs with the rewritten UTF8-C8 implementation.

-zefram