Do not delete job count environment variables

jjatria commented 2 years ago

The environment variables related to yath's job count are marked as clean_env_vars, which deletes them on startup. This was added on https://github.com/Test-More/Test2-Harness/commit/5ce0d08b471beeba5e958a8e79607ce4670fcda1 which states this was to avoid "race conditions".

If this must stay this way, documenting that these variables are not available during tests would be useful (it would have saved me some spelunking at least).

But this makes it impossible to introspect the number of jobs that were set for a particular job run.

Do you remember what those race conditions were? If we know that, then we can explore other possible solutions and leave these variables in place if they were set: deleting a user-set variable seems like a strange thing to do in general.

exodist commented 2 years ago

I do not remember any race conditions, but it could have been a brain fart. The problem is that there are a few places, some where I have worked, and some inside yaths own test suite where yath launches a nested yath. This nested yath would pick up the environment variable and suddenly the number of processes is compounded, IE yath with 8 procs could run 8 tests that all call yath with 8 procs, making 64 procs.

The environment variable was not intended for general use, it was added because I could not find any other way to tell DZIL that yath should be called with a specific concurrency value when I mint releases. And once consumed it should not be passed on to any yath's that get started under other tests or shells, so yath deletes the variable before launching tests.

The variable is also not useful inside tests because while it can be used to set the concurrency level, the more typical and common ways of setting it (-j# arg) do not set the env var. So looking at the env var is not useful/reliable even if yath did not clean it up.

If having yath's concurrency value, which is intended to put a specific limit on the number of jobs yath will spawn, is somehow useful in the test, then we may be able to come up with a reliable way to pass it on. However using that value inside tests sounds like a bad idea, it would also mean the tests will not work right in prove doesn't it?

What problem is being solved by having access to the concurrency value, there might be a better way to solve it, and I would be happy to help with that.

jjatria commented 2 years ago

I see. That's tricky. I imagine that in that case (and assuming we cannot locally unset those env vars) the best we can do is to document that this is the case. It did catch me by surprise: I don't think I ever expected a program to unset my env vars. :)

More specifically about the problem that lead me to this:

When we run our service tests at $work, we bring up a number of database instances, and we use the number of jobs to implement a rudimentary load balancer of sorts that will make sure any single DB is only ever being used by one test at a time. The number of jobs was used to determine how large that pool of DB instances needed to be.

The number itself was not being used in a test file, but in one of the preload classes: when a test started we would cycle one instance out of the available pool, and when it ended we would cycle it back in.

Our test suite was implemented using Test2::Harness 0.001099, and the reason I bumped into this is because I'm trying to update it so it can use the new more stable API.

But one of the things I've noticed is that if we want to get more in-depth coverage, and we do, then we need to use --load-import and --no-use-fork (like what the --cover option does, although we do not use Devel::Cover). And if we do that, then we no longer do preloading, so it might be that the whole approach needs to be re-designed.

If I had a way to run code before and after a test run, ideally even when not forking, then I could re-implement this behaviour on my side.

exodist commented 2 years ago

There are 2 things you will want to look at:

The first is a Resource class: https://metacpan.org/pod/Test2::Harness::Runner::Resource This is what I use to manage database initialization (pre-testing) and assignment (before each test). I recommend this if you are creating N databases and want them assigned to tests in such a way as to avoid 2 tests accidentally using the same one. It has access to all the settings including the -J value

The other is writing an App::Yath::Plugin, see https://metacpan.org/pod/App::Yath::Plugin and its parent class https://metacpan.org/pod/Test2::Harness::Plugin#$plugin-%3Esetup($settings) This also gives you hooks for doing things pre-testing, and once again has access to all the settings. This is less complex, but also less capable when compared to resource classes.

exodist commented 2 years ago

I should also point out that the things I referenced above can do things before all testing, before each test, after each test, and after all testing.

jjatria commented 2 years ago

Oh, that resource class looks very promising. Thanks for the heads up! I had already tried using a plugin, but looking at the methods in that class, and the ones in the parent class, I did not immediately get which methods would allow me to hook before / after a single test run.

I'll take a look at the resource class and see where I get to, but FWIW, I still think that if yath is keeping this behaviour of unsetting environment variables (which I think is not very standard) we should at least document it.

exodist commented 2 years ago

I do agree, it should be documented.

Test-More / Test2-Harness

Do not delete job count environment variables #250