Test-More / Test2-Harness

Alternative to Test::Harness
Other
23 stars 26 forks source link

Preload is useless with tainting #262

Open JRaspass opened 1 year ago

JRaspass commented 1 year ago

Given the following stripped down example:

lib/MyExpensive.pm:

package MyExpensive;

sleep 1;

1;

lib/MyPreload.pm:

package MyPreload;

use Test2::Harness::Runner::Preload;

stage DEFAULT => sub { default; preload 'MyExpensive' };

1;

t/foo.t:

use Test2::V0 -target => 'MyExpensive';

ok $ENV{T2_HARNESS_PRELOAD};

done_testing;

Running the tests with preload shows everything passes and the extra second isn't attributed to the test's "startup" cost:

$ yath test -TP MyPreload
( PASSED )  job  1    t/foo.t
(  TIME  )  job  1    Startup: 0.04258s | Events: 0.00018s | Cleanup: 0.00407s | Total: 0.04683s

However if we add a tainting shebang to the test file:

#!perl -T

use Test2::V0 -target => 'MyExpensive';

ok $ENV{T2_HARNESS_PRELOAD};

done_testing;

Then the test fails and takes longer:

$ yath test -TP MyPreload
[  FAIL  ]  job  1  + <UNNAMED ASSERTION>
(  DIAG  )  job  1    Failed test at t/foo.t line 5.
(  DIAG  )  job  1    Seeded srand with seed '20230215' from local date.
( FAILED )  job  1    t/foo.t
(  TIME  )  job  1    Startup: 1.08933s | Events: 0.00030s | Cleanup: 0.00322s | Total: 1.09285s

Now imagine that you have hundreds of tests, all with tainting enabled and lots of expensive libraries that take seconds to load, suddenly you're in my situation where $WORK's codebase takes tens of minutes to test :-(

I thought this was a regression since I'm sure this used to work but I've yet to find a version where it works correctly.

I appreciate this is hard to make work as you can only adjust the taint mode at perl startup time and the worker you're forking doesn't have it enabled so can we either have multiple workers based on the set of interpreter flags we see in test files? Or an ugly hack to run all of yath under taint worker and all.

exodist commented 1 year ago

Hmm. It may be possible to add some logic to preload tools that acounts for this.

Maybe something like this: taint_stage FOO => sub { ... }

This can add a stage that need sot have taint checking turned on. When the runner is initializing stages it can launch a new perl interpreter to start the stage, instead of a plain fork. The one it starts can have taint enabled at the start.

This would only really work for top-level stages, as the new process will not be able to inherit anything preloaded in earlier stages. But child-stages of the taint one(s) would inherit taint checking.

I am not sure how soon I could get to this. But Using the specification above, if you wanted to look into writing a PR I should be able to review it and/or give pointers and release it once it is working.

JRaspass commented 1 year ago

So after bashing my head against this for a few hours I'm not really sure how to achieve what you suggest and could do with a few pointers.

So I found where you fork in Test2::Harness::Runner::Preloader::launch_stage() and while I could easily exec a new perl interpreter with tainting enabled, I don't see how to make said interpreter have the correct state to continue execution and go on to call start_stage. Damn this would be so much easier if tainting could be toggled at runtime.

I also played with a far easier but much hackier approach where the yath script would re-exec itself at the start if you passed a taint flag or it read it from the config file, this way the whole thing runs under tainting. I had to sprinkle a few untainting regexes in places like App::Yath::load_options() that do runtime requires with user supplied data to make it work. This avoids having to remember state as I just get it to start again from the start with the same command line options and only exec if ${^TAINT} != taint flag/option. So this would lead to a slower start for people requesting a tainted yath but they're probably in the minority anyways.

exodist commented 1 year ago

If you want to add one or more taint options to yath itself so that the entire thing runs with taint, I would review and probably merge that.

One thing to note is that yath itself does not need to start with taint, only the yath-runner process (and processes forked from it) need the taint mode enabled for the tests to run in taint. the ::test or ::start command(s) launch yath-runner by executing yath runner ... so you can find that and just make sure it is started with taint as needed based on command line arguments.

JRaspass commented 1 year ago

So I expanded the test cases to try and understand the runner logic better, is my understanding correct? Each test is always executed from a new process, those processes can be forks of perl which in turn can have libraries preloaded into them. The shebangs in test files influence which tests share a preloaded fork.

I expanded the test file to look like this:

#!perl

use Test2::V0 -target => 'MyExpensive';

diag "pid:     $$";
diag "ppid:    ${\getppid}";
diag "preload: @{[ $ENV{T2_HARNESS_PRELOAD} // 0 ]}";
diag "taint:   ${^TAINT}";
diag "warning: $^W";

pass;

done_testing;

Turned it into three files which vary in shebang:

$ head -n1 t/perl*.t
==> t/perl.t <==
#!perl

==> t/perl-T.t <==
#!perl -T

==> t/perl-w.t <==
#!perl -w

Added a bit more debug to the "expensive" library:

package MyExpensive;

warn "$$ is loading me...\n";
sleep 3;

1;

Added a shell test for good measure:

#!/bin/sh

echo '#' preload: $T2_HARNESS_PRELOAD >&2

echo ok 1
echo 1..1

And this is the output I get:

$ yath test -TP MyPreload
(INTERNAL)     54623 is loading me...
( STDERR )  job  1    54655 is loading me...
(  DIAG  )  job  1    pid:     54655
(  DIAG  )  job  1    ppid:    54623
(  DIAG  )  job  1    preload: 0
(  DIAG  )  job  1    taint:   1
(  DIAG  )  job  1    warning: 0
( PASSED )  job  1    t/perl-T.t
(  TIME  )  job  1    Startup: 3.10599s | Events: 0.00168s | Cleanup: 0.00284s | Total: 3.11051s
(  DIAG  )  job  2    pid:     54656
(  DIAG  )  job  2    ppid:    54623
(  DIAG  )  job  2    preload: 1
(  DIAG  )  job  2    taint:   0
(  DIAG  )  job  2    warning: 1
( PASSED )  job  2    t/perl-w.t
(  TIME  )  job  2    Startup: 0.03992s | Events: 0.00146s | Cleanup: 0.00410s | Total: 0.04548s
(  DIAG  )  job  3    pid:     54657
(  DIAG  )  job  3    ppid:    54623
(  DIAG  )  job  3    preload: 1
(  DIAG  )  job  3    taint:   0
(  DIAG  )  job  3    warning: 0
( PASSED )  job  3    t/perl.t
(  TIME  )  job  3    Startup: 0.04139s | Events: 0.00156s | Cleanup: 0.00426s | Total: 0.04721s
(  DIAG  )  job  4    preload:
( PASSED )  job  4    t/sh.t
(  TIME  )  job  4    Startup: 0.00789s | Events: 0.00000s | Cleanup: -0.00252s | Total: 0.00537s

So a few observations (feel free to correct any of them).

JRaspass commented 1 year ago

Oh it's worth noting that I couldn't get any of the above to work without applying my hack from https://github.com/Test-More/Test2-Harness/issues/211#issuecomment-909765020. I kinda get the impression that tainting isn't a very popular use case?

exodist commented 1 year ago

shbang processing is mostly done here: https://github.com/Test-More/Test2-Harness/blob/master/lib/Test2/Harness/TestFile.pm

However the decision to skip the preload for taint happens here: https://github.com/Test-More/Test2-Harness/blob/master/lib/Test2/Harness/Runner/Job.pm#L419 Basically it refuses to do preload+fork for any flags other than -w as it is the only one that can be enabled at runtime.

I am not sure about the popularity of taint mode. I have never worked anywhere that uses it.

JRaspass commented 1 year ago

Thanks for the info. As for potential fixes I think regardless of whether it happens in the Runner or the Preloader somewhere a fork need to be replaced with a fork + exec perl -T which means working out some way to convert the state of the currently running fork into some kind of IPC process to pass to the newly formed interpreter, that's the bit I'm really struggling with. That's what motivated my approach to just say screw it and run the whole process tree under a tainted perl instead.

If anyone has any bright ideas on how to do the IPC I'm all ears because at present I'm pretty stalled on this issue.

exodist commented 1 year ago

I think the best option is to add a taint mode option. First you need to add the option in https://github.com/Test-More/Test2-Harness/blob/master/lib/App/Yath/Options/Runner.pm#L18 Then you need to modify https://github.com/Test-More/Test2-Harness/blob/master/lib/App/Yath/Command/test.pm#L890 to add the taint flag (This starts the yath-runner process, and it does not fork to do it, completely new instance of perl) Then finally update the logic at https://github.com/Test-More/Test2-Harness/blob/master/lib/Test2/Harness/Runner/Job.pm#L419 so that it does allow fork if the flag is -T and the runner has taint mode enabled.

Then clean up any places where data in the runner is currently considered tainted, like your hack up above.

JRaspass commented 1 year ago

Thanks, I'll give this approach a go.

JRaspass commented 1 year ago

Quick update, this approach does seem to work. I have a very rough around the edges PoC commit that seems to DTRT. ATM it hardcodes tainting on and disables the shebang flag checking but they're all fixable.

The initial work is in the draft PR #264. Not wild about the sprinkling of untaint() everywhere. Open to better ideas.

JRaspass commented 1 year ago

Sorry for the radio silence, other stuff came up and I haven't been able to touch this much, this is what's currently known left to do (there's probably more!):

  1. I've described the flag as best I can in Options::Runner but it might also need POD?
  2. I really really hate the logic on working out if we can fork:
    
    # This approach won't scale if we allow even more swiches.
    my @allowed_switches = '-w';

Allow taint and taint + warnings if we're a tainted runner.

push @allowed_switches => qw/-T -wT -Tw/ if ${^TAINT};

my $allowed_switches = join '|', map { quotemeta } @allowed_switches; my $allowed_switches_re = qr/\s(?:$allowed_switches)\s/;

return $self->{+USEFORK} = 0 if grep { $ !~ $allowed_switches_re } $self->switches;

We're running under the taint but the test hasn't requested taint.

return $self->{+USE_FORK} = 0 if ${^TAINT} && !grep { /\s-w?Tw?\s/ } $self->switches;


The complication mainly stems from parsing the shebang with regexes and accounting for argument bundling. I'm hesitant to start actually parsing the shebang properly as we'd have to ensure we parse it the same way perl does. Suggestions on how to improve this logic are VERY welcome!
3. I'm really struggling on how to add a test for this new taint flag. I agree with you that without it it'll be very possible to add a regression in this functionality. Any pointers here would be welcome too.

So mainly I'm blocked on points 2 and 3. In terms of good news though, the patch works! I've been trying it out at $WORK and it really shaves heaps of time off our test suite run, so that's good :tada: (99% of our tests enable tainting since our product also does).