Test-More / Test2-Harness

Alternative to Test::Harness
Other
23 stars 26 forks source link

yath warns about wide chars when test notes/diags UTF-8 #149

Closed FGasper closed 4 years ago

FGasper commented 4 years ago

Given:

#!perl

use strict;
use warnings;

use Test::More;

note "valid note [“”]";

diag "valid diag [“”]";

print "valid stdout [“”]\n";

print STDERR "valid stderr [“”]\n";

ok 1;

done_testing();

… yath does:

> yath -v test_that_outputs_bad_utf8.pl

** Defaulting to the 'test' command **

( LAUNCH )  job  1    test_that_outputs_bad_utf8.pl
Wide character in print at /opt/local/lib/perl5/site_perl/5.26/Test2/Formatter/Test2.pm line 276, <$fh> line 6.
(  NOTE  )  job  1    valid note [“”]
Wide character in print at /opt/local/lib/perl5/site_perl/5.26/Test2/Formatter/Test2.pm line 276, <$fh> line 6.
(  DIAG  )  job  1    valid diag [“”]
( STDOUT )  job  1    valid stdout [“”]
( STDERR )  job  1    valid stderr [“”]
[  PASS  ]  job  1  + <UNNAMED ASSERTION>
[  PLAN  ]  job  1    Expected assertions: 1
( PASSED )  job  1    test_that_outputs_bad_utf8.pl
(  TIME  )  job  1    ----- START -----
Startup: 0.04427s | Events: 0.00090s | Cleanup: 0.02388s | Total: 0.06905s
(  TIME  )  job  1    ------ END ------

================================================================================

Run ID: C370ACA0-4D8B-11EA-ACC7-7F58D885BA62

All tests were successful!

0.31001s on wallclock (0.13 usr 0.02 sys + 0.15 cusr 0.03 csys = 0.33 CPU)

Test2::Harness info:

Yath version: 0.001099

Other version info:
+----------------+----------+
| COMPONENT      | VERSION  |
+----------------+----------+
| perl           | v5.26.3  |
| App::Yath      | 0.001099 |
| Test2::Harness | 0.001099 |
| Test2::Suite   | 0.000127 |
| Test2::API     | 1.302170 |
| Test::Builder  | 1.302170 |
+----------------+----------+
FGasper commented 4 years ago

prove correctly handles this:

> prove -v test_that_outputs_bad_utf8.pl
test_that_outputs_bad_utf8.pl ..
# valid note [“”]
valid stdout [“”]
ok 1
1..1
# valid diag [“”]
valid stderr [“”]
ok
All tests successful.
Files=1, Tests=1,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.04 cusr  0.00 csys =  0.07 CPU)
Result: PASS
FGasper commented 4 years ago

The Test2::Harness::Collector process does this:

[pid 30833] read(12</tmp/yath-test-30826-OrN6_LIG/1ACDB63C-4D8C-11EA-9FCA-5E4AF8A863B7/1AD06378-4D8C-11EA-9FCA-5E4AF8A863B7+0/events/events~30838~0.jsonl>, "{\"event_id\":\"1B8142C4-4D8C-11EA-8AC0-6A4AF8A863B7\",\"stream_id\":1,\"times\":[0.08,0.02,0.0,0.0],\"facet_data\":{\"info\":[{\"tag\":\"NOTE\",\"debug\":0,\"details\":\"valid note [\303\242\302\200\302\234\303\242\302\200\302\235]\"}]

Note how “valid note” is extra-encoded … but that’s OK since when it’s read the JSON decoder will decode that and get us back to the original (UTF-8 bytes) code points.


But look at how that process outputs that same string:

[pid 30833] write(4<pipe:[47556230]>, "{\"stamp\":1581507474.04386,\"assert_count\":0,\"times\":[0.08,0.02,0.0,0.0],\"stream_id\":1,\"tid\":0,\"job_id\":\"1AD06378-4D8C-11EA-9FCA-5E4AF8A863B7\",\"event_id\":\"1B8142C4-4D8C-11EA-8AC0-6A4AF8A863B7\",\"job_try\":0,\"run_id\":\"1ACDB63C-4D8C-11EA-9FCA-5E4AF8A863B7\",\"facet_data\":{\"hubs\":[{\"tid\":0,\"details\":\"Test2::Hub\",\"uuid\":\"1B7F05FE-4D8C-11EA-8AC0-6A4AF8A863B7\",\"ipc\":0,\"pid\":30838,\"nested\":0,\"buffered\":0,\"hid\":\"30838~0~1581507474~2\"}],\"trace\":{\"buffered\":0,\"huuid\":\"1B7F05FE-4D8C-11EA-8AC0-6A4AF8A863B7\",\"hid\":\"30838~0~1581507474~2\",\"uuid\":\"1B813BF8-4D8C-11EA-8AC0-6A4AF8A863B7\",\"pid\":30838,\"nested\":0,\"cid\":\"30838~0~1581507474~5\",\"tid\":0,\"frame\":[\"main\",\"test_that_outputs_raw_utf8.pl\",8,\"Test::More::note\"]},\"harness\":{\"run_id\":\"1ACDB63C-4D8C-11EA-9FCA-5E4AF8A863B7\",\"job_try\":0,\"event_id\":\"1B8142C4-4D8C-11EA-8AC0-6A4AF8A863B7\",\"job_id\":\"1AD06378-4D8C-11EA-9FCA-5E4AF8A863B7\"},\"info\":[{\"debug\":0,\"tag\":\"NOTE\",\"details\":\"valid note [\342\200\234\342\200\235]\"}]

Here it’s not UTF-8 encoded; now it’s literal bytes. So something is doing an extra decode.

FGasper commented 4 years ago

Hm, apparently the Test2::Harness that we have installed isn’t on GitHub?

Anyway, Test2::Harness::Collector::JobDir appears to be opening files '<:utf8', which will cause the extra decode.

FGasper commented 4 years ago

Changing JobDir::events_files() to open files :raw rather than :utf8 fixes the issue.

FGasper commented 4 years ago

I’m guessing that, depending on the encoding() setting, one of these IPC pipes needs the same :utf8 that the events-reader file handle has.

exodist commented 4 years ago

2 questions: 1) have you tried the major_refactor branch? That branch is practically a rewrite, and will become the new master shortly. Lots of encoding fixes have happened there.

2) what happens if you drop a use utf8 at the top of the test file? The character in your prints may be utf8, but you never told perl the source was in utf8.

toddr commented 4 years ago

@FGasper the main Cpanel rpms are currently using the major_refactor branch.

FGasper commented 4 years ago

Answers to @exodist: 1) Yes, the problem happens on both major_refactor and upstream. 2) The problem worsens if I drop use utf8 at the top of the test file: it produces a second wide character warning, from the test’s own print statements.

The problem is that a utf8 filehandle is receiving already-encoded UTF-8.

exodist commented 4 years ago

When I run this test on the latest major_refactor code I get this output:

perl -Ilib scripts/yath test xxx.t
Wide character in print at /home/exodist/projects/Test2/Test2-Harness/lib/Test2/Formatter/Test2.pm line 280, <__ANONIO__> line 5.
(  DIAG  )  job  1    valid diag [“”]
( STDERR )  job  1    valid stderr [“”]
( PASSED )  job  1    xxx.t

                                Yath Result Summary
-----------------------------------------------------------------------------------
     File Count: 1
Assertion Count: 1
      Wall Time: 0.19 seconds
       CPU Time: 0.30 seconds (usr: 0.09s | sys: 0.00s | cusr: 0.18s | csys: 0.03s)
      CPU Usage: 155%
    -->  Result: PASSED  <--

Still have a single wide character error, but not a flood of them

exodist commented 4 years ago

Hmm, I can no longer reproduce what I posted above. I must have had a partial change that got wiped out.

Running the test as-is I get the 2 warnings. When I add Test2::Plugin::UTF8 that fixes the issue and I get no warnings.

That said I am thinking "use the utf8 plugin" is not necessarily the right solution, I am going to look deeper into all this.

FGasper commented 4 years ago

For posterity: https://github.com/FGasper/Test2-Harness/tree/issue_149_straighten_out_utf8_knock_on_wood ^^ My attempt to fix the problem.


A test of various encoding states:

#!perl

use strict;
use warnings;

use Test::More;

note "valid note [“”\xff\xff]";
note "valid note [“”]";

diag "valid diag [“”\xff\xff]";
diag "valid diag [“”]";

print "valid stdout [“”\xff\xff]\n";
print "valid stdout [“”]\n";

print STDERR "valid stderr [“”\xff\xff]\n";
print STDERR "valid stderr [“”]\n";

my $e_acute_latin1 = "\xe9";

note $e_acute_latin1;
diag $e_acute_latin1;
print "$e_acute_latin1\n";
print STDERR "$e_acute_latin1\n";

my $e_acute = $e_acute_latin1;
utf8::encode($e_acute);

my $double_encoded_e_acute = $e_acute;
utf8::encode($double_encoded_e_acute);

note $e_acute;
diag $e_acute;
print "$e_acute\n";
print STDERR "$e_acute\n";

note $double_encoded_e_acute;
diag $double_encoded_e_acute;
print "$double_encoded_e_acute\n";
print STDERR "$double_encoded_e_acute\n";

ok 1;

done_testing();