Closed hiratara closed 1 month ago
You assert that the problem began in https://github.com/Perl/perl5/commit/a460925186154b270b7a647a4f30b2f01fd97c4b (@khwilliamson):
commit a460925186154b270b7a647a4f30b2f01fd97c4b
Author: Karl Williamson <khw@cpan.org>
AuthorDate: Thu May 13 10:50:23 2021
Commit: Karl Williamson <khw@cpan.org>
CommitDate: Tue Feb 1 22:33:48 2022
Refactor utf8 to code point conversion
I wanted to explore this argument in a test-driven manner. So I checked out the commit immediately before that one:
commit c0e63b1341c14ebd84a23fdbba9759d3fd686498
Author: Karl Williamson <khw@cpan.org>
AuthorDate: Tue Feb 1 20:58:08 2022
Commit: Karl Williamson <khw@cpan.org>
CommitDate: Tue Feb 1 22:01:13 2022
builtin.c: Fix up 730f927
That commit got pushed without a needed fix.
Then configured (-des -Dusedevel
) and built an ordinary perl
at that point, then added the two unit tests from your pull request to t/op/lex.t
.
$ git show | cat
commit 8fbf143586f593a5f599f1b633dc558b7269a3a7
Author: James E Keenan <jkeenan@cpan.org>
Date: Sun Sep 15 12:51:20 2024 -0400
Add 2 unit tests from pull request GH #22597
Added to the commit immediately preceding the commit which the poster
states broke utf8n_to_uvchr, without adding code changes from that p.r.
diff --git a/t/op/lex.t b/t/op/lex.t
index e78fad2c42..a5703fc53e 100644
--- a/t/op/lex.t
+++ b/t/op/lex.t
@@ -7,7 +7,8 @@ use warnings;
BEGIN { chdir 't' if -d 't'; require './test.pl'; }
-plan(tests => 36);
+#plan(tests => 36);
+plan(tests => 38);
{
print <<''; # Yow!
@@ -284,3 +285,20 @@ EOM
fresh_perl_like('flock _$', qr/Not enough arguments for flock/, {stderr => 1},
"[perl #129190] intuit_method() invalidates PL_bufptr");
+
+# test-driven development; first, add the tests from GH #22597
+
+
+fresh_perl_like(
+ qq(use utf8; \xC2\xE3\x81\x82),
+ qr/^Malformed UTF-8 character:/,
+ {stderr => 1},
+ 'Error handling for invalid UTF-8 sequences starting with leading bytes',
+);
+
+fresh_perl_like(
+ qq(use utf8; \xFF\xE3\x81\x82),
+ qr/^Malformed UTF-8 character:/,
+ {stderr => 1},
+ 'Error handling for invalid UTF-8 sequences starting with unassigned bytes',
+);
I then ran that test program through the harness:
$ cd t;./perl harness -v op/lex.t; cd -
ok 1
ok 2
...
ok 37 - Error handling for invalid UTF-8 sequences starting with leading bytes
ok 38 - Error handling for invalid UTF-8 sequences starting with unassigned bytes
ok
All tests successful.
Files=1, Tests=38, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.04 cusr 0.09 csys = 0.15 CPU)
Result: PASS
/home/jkeenan/gitwork/perl
I then cherry-picked a460925186154b270b7a647a4f30b2f01fd97c4b into that branch, rebuilt and re-tested.
$ cd t;./perl harness -v op/lex.t; cd -
ok 1
ok 2
...
ok 37 - Error handling for invalid UTF-8 sequences starting with leading bytes
not ok 38 - Error handling for invalid UTF-8 sequences starting with unassigned bytes
# Failed test 38 - Error handling for invalid UTF-8 sequences starting with unassigned bytes at ./test.pl line 1088
# got 'panic: _force_out_malformed_utf8_message should be called only when there are errors found at - line 1.'
# expected /(?^:^Malformed UTF-8 character:)/
# PROG:
# use utf8; �あ
# STATUS: 65280
Failed 1/38 subtests
Test Summary Report
-------------------
op/lex.t (Wstat: 0 Tests: 38 Failed: 1)
Failed test: 38
Files=1, Tests=38, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.05 cusr 0.08 csys = 0.15 CPU)
Result: FAIL
So it appears that one of the two unit tests you are proposing to add would have PASSed both before the breaking commit and at the breaking commit. The breaking commit (a460925186) appears to have broken only the code in your second unit test.
Can you clarify?
(My diagnostic branch can be found here.)
NOTE: If this pull request is accepted, the changes may be backported to maintenance releases for perl-5.36, perl-5.38 and perl-5.40.
@hiratara, if there is something specifically wrong with Encode, you should file a bug ticket in that upstream distribution's bug tracker: https://rt.cpan.org/Dist/Display.html?Name=Encode. That would enable @dankogai to make changes for Encode against any version of perl.
The tests belong instead in ext/XS-APItest/t/utf8.t. There already is a function accessible from that file test_utf8n_to_uvchr_error()
that allows for arbitrary strings to be tested. If you don't feel like doing this, I will do so, so just pull out that portion of your commits.
I see the tests for translating to code point are minimal.. The functions that check if a string is well-formed UTF-8 have extensive tests, and they don't have this bug. I would adapt some of those tests to work on this.
@jkeenan, I've updated the commit messages and removed the changes from t/op/lex.t
. I attempted to add tests to ext/XS-APItest/t/utf8.t
, but found it too complicated for me to handle. Could you please take care of adding the test cases instead?
I've force-pushed the updated commits to this branch and backed up the original branch here: https://github.com/hiratara/perl5/tree/tmp/panic-with-invalid-utf8-BK
To be clear, this PR adds tests for this https://github.com/Perl/perl5/pull/22646
Hello,
Since commit a460925186154b270b7a647a4f30b2f01fd97c4b,
utf8n_to_uvchr
has been broken, causing some peculiar behavior when handling invalid UTF-8 sequences.For example, invalid UTF-8 sequences cause a panic message:
This bug also affects
Encode.pm
:This PR fixes those problems.