Closed FGasper closed 4 years ago
prove
correctly handles this:
> prove -v test_that_outputs_bad_utf8.pl
test_that_outputs_bad_utf8.pl ..
# valid note [“”]
valid stdout [“”]
ok 1
1..1
# valid diag [“”]
valid stderr [“”]
ok
All tests successful.
Files=1, Tests=1, 0 wallclock secs ( 0.02 usr 0.01 sys + 0.04 cusr 0.00 csys = 0.07 CPU)
Result: PASS
The Test2::Harness::Collector process does this:
[pid 30833] read(12</tmp/yath-test-30826-OrN6_LIG/1ACDB63C-4D8C-11EA-9FCA-5E4AF8A863B7/1AD06378-4D8C-11EA-9FCA-5E4AF8A863B7+0/events/events~30838~0.jsonl>, "{\"event_id\":\"1B8142C4-4D8C-11EA-8AC0-6A4AF8A863B7\",\"stream_id\":1,\"times\":[0.08,0.02,0.0,0.0],\"facet_data\":{\"info\":[{\"tag\":\"NOTE\",\"debug\":0,\"details\":\"valid note [\303\242\302\200\302\234\303\242\302\200\302\235]\"}]
Note how “valid note” is extra-encoded … but that’s OK since when it’s read the JSON decoder will decode that and get us back to the original (UTF-8 bytes) code points.
But look at how that process outputs that same string:
[pid 30833] write(4<pipe:[47556230]>, "{\"stamp\":1581507474.04386,\"assert_count\":0,\"times\":[0.08,0.02,0.0,0.0],\"stream_id\":1,\"tid\":0,\"job_id\":\"1AD06378-4D8C-11EA-9FCA-5E4AF8A863B7\",\"event_id\":\"1B8142C4-4D8C-11EA-8AC0-6A4AF8A863B7\",\"job_try\":0,\"run_id\":\"1ACDB63C-4D8C-11EA-9FCA-5E4AF8A863B7\",\"facet_data\":{\"hubs\":[{\"tid\":0,\"details\":\"Test2::Hub\",\"uuid\":\"1B7F05FE-4D8C-11EA-8AC0-6A4AF8A863B7\",\"ipc\":0,\"pid\":30838,\"nested\":0,\"buffered\":0,\"hid\":\"30838~0~1581507474~2\"}],\"trace\":{\"buffered\":0,\"huuid\":\"1B7F05FE-4D8C-11EA-8AC0-6A4AF8A863B7\",\"hid\":\"30838~0~1581507474~2\",\"uuid\":\"1B813BF8-4D8C-11EA-8AC0-6A4AF8A863B7\",\"pid\":30838,\"nested\":0,\"cid\":\"30838~0~1581507474~5\",\"tid\":0,\"frame\":[\"main\",\"test_that_outputs_raw_utf8.pl\",8,\"Test::More::note\"]},\"harness\":{\"run_id\":\"1ACDB63C-4D8C-11EA-9FCA-5E4AF8A863B7\",\"job_try\":0,\"event_id\":\"1B8142C4-4D8C-11EA-8AC0-6A4AF8A863B7\",\"job_id\":\"1AD06378-4D8C-11EA-9FCA-5E4AF8A863B7\"},\"info\":[{\"debug\":0,\"tag\":\"NOTE\",\"details\":\"valid note [\342\200\234\342\200\235]\"}]
Here it’s not UTF-8 encoded; now it’s literal bytes. So something is doing an extra decode.
Hm, apparently the Test2::Harness that we have installed isn’t on GitHub?
Anyway, Test2::Harness::Collector::JobDir appears to be opening files '<:utf8'
, which will cause the extra decode.
Changing JobDir::events_files() to open files :raw rather than :utf8 fixes the issue.
I’m guessing that, depending on the encoding() setting, one of these IPC pipes needs the same :utf8 that the events-reader file handle has.
2 questions: 1) have you tried the major_refactor branch? That branch is practically a rewrite, and will become the new master shortly. Lots of encoding fixes have happened there.
2) what happens if you drop a use utf8
at the top of the test file? The character in your prints may be utf8, but you never told perl the source was in utf8.
@FGasper the main Cpanel rpms are currently using the major_refactor branch.
Answers to @exodist:
1) Yes, the problem happens on both major_refactor and upstream.
2) The problem worsens if I drop use utf8
at the top of the test file: it produces a second wide character warning, from the test’s own print statements.
The problem is that a utf8 filehandle is receiving already-encoded UTF-8.
When I run this test on the latest major_refactor code I get this output:
perl -Ilib scripts/yath test xxx.t
Wide character in print at /home/exodist/projects/Test2/Test2-Harness/lib/Test2/Formatter/Test2.pm line 280, <__ANONIO__> line 5.
( DIAG ) job 1 valid diag [“”]
( STDERR ) job 1 valid stderr [“”]
( PASSED ) job 1 xxx.t
Yath Result Summary
-----------------------------------------------------------------------------------
File Count: 1
Assertion Count: 1
Wall Time: 0.19 seconds
CPU Time: 0.30 seconds (usr: 0.09s | sys: 0.00s | cusr: 0.18s | csys: 0.03s)
CPU Usage: 155%
--> Result: PASSED <--
Still have a single wide character error, but not a flood of them
Hmm, I can no longer reproduce what I posted above. I must have had a partial change that got wiped out.
Running the test as-is I get the 2 warnings. When I add Test2::Plugin::UTF8 that fixes the issue and I get no warnings.
That said I am thinking "use the utf8 plugin" is not necessarily the right solution, I am going to look deeper into all this.
For posterity: https://github.com/FGasper/Test2-Harness/tree/issue_149_straighten_out_utf8_knock_on_wood ^^ My attempt to fix the problem.
A test of various encoding states:
#!perl
use strict;
use warnings;
use Test::More;
note "valid note [“”\xff\xff]";
note "valid note [“”]";
diag "valid diag [“”\xff\xff]";
diag "valid diag [“”]";
print "valid stdout [“”\xff\xff]\n";
print "valid stdout [“”]\n";
print STDERR "valid stderr [“”\xff\xff]\n";
print STDERR "valid stderr [“”]\n";
my $e_acute_latin1 = "\xe9";
note $e_acute_latin1;
diag $e_acute_latin1;
print "$e_acute_latin1\n";
print STDERR "$e_acute_latin1\n";
my $e_acute = $e_acute_latin1;
utf8::encode($e_acute);
my $double_encoded_e_acute = $e_acute;
utf8::encode($double_encoded_e_acute);
note $e_acute;
diag $e_acute;
print "$e_acute\n";
print STDERR "$e_acute\n";
note $double_encoded_e_acute;
diag $double_encoded_e_acute;
print "$double_encoded_e_acute\n";
print STDERR "$double_encoded_e_acute\n";
ok 1;
done_testing();
Given:
… yath does:
Test2::Harness info: