Switch to Pester 5 for Unit Tests

MethodsAndPractices / vsteam

PowerShell module for accessing Azure DevOps Services and Azure DevOps Server (formerly VSTS or TFS)

https://methodsandpractices.github.io/vsteam-docs/

MIT License

445 stars 155 forks source link

Switch to Pester 5 for Unit Tests #312

Closed SebastianSchuetze closed 4 years ago

SebastianSchuetze commented 4 years ago

We could switch to Pester 5 for the unit tests. From what I can read on the latest release it could make the unit tests more reliable and performant in the future.

Also on twitter, the creator said that he would concentrate on code coverage performance in Version 5.1

I plan to focus a lot on CC performance in 5.1 release. :) https://twitter.com/nohwnd/status/1260810281485287424

SebastianSchuetze commented 4 years ago

Possibly the PR #282 with issue #262 could be related for performance, since it was deactivated for PRs due to running to long (13-20 minutes each run)

DarqueWarrior commented 4 years ago

I started looking at this today. Other good resources: DISCOVERY AND SCRIPT SETUP IMPORTING YOUR PS1 FILES

DarqueWarrior commented 4 years ago

I plan to have this done this before Monday.

SebastianSchuetze commented 4 years ago

For sure. Because then you need to Build! 🤣

jhoneill commented 4 years ago

Just be aware of something on dot sourcing files out of the source tree

Imagine this simple function

Function Get-Thing {
Param( 
  [ThingNameValidator()]
  $ThingName
) 
etc
}

You write your unit test and at the start you have

$here = Split-Path -Parent $MyInvocation.MyCommand.Path
$sut = (Split-Path -Leaf $MyInvocation.MyCommand.Path).Replace(".Tests.", ".")

. "$here/../../Source/Public/$sut"

(Which is now broken in Pester 5.) This has multiple issues.

It depends on the test and the file being tested being in expected locations
It depends on the name of the test file and the source file matching (case sensitively if running with a case sensitive file system, i.e. on Linux)
It doesn't (yet) load any dependencies.

Because the function uses a validator class the last line becomes two lines.

. "$here/../../Source/Public/$sut"
. "$here/../../Source/Classes/ThingNameValidator.ps1"

The filename / path objections still apply with the additional possibility that the class is added in the unit test but not in the build process so the test passes but the built module does not work. But at this point the test runs and all is well.

Then the developer of the name-validator decides to add caching support, using another class named "ThingCache" and anything which validates the name has an indirect dependency on "ThingCache" requiring all their tests to load an additional file. The developer probably will not (should not ?) change the tests which validate the work of others.

So working code now has test failures (and a change in one place appears, falsely, to be causing errors in other places), because those tests relied on particular details of how it was implemented. Changing from using Invoke-VSTeamRequest to using _CallAPI or from directly referencing a static object (like [VsteamVersions]) to using _getApiVersion also changes the dependencies needed without changing the results of running the code.

As a general guiding principle, unit tests should be insensitive to implementation details in dependencies of the code being tested. Or, put another way, if running two versions of code with the same functional-test inputs produces the same outputs, a purely functional test using those inputs must give the same the result for both versions.

A future decision to split the classes directory into "Object Classes" and "Parameter Classes", for example, would break many tests. I'm not suggesting such a split, it's not good when the tests in place mean that files can't be moved, renamed, split or merged

Therefore can I suggest:

Dependencies should be delivered by loading the module before running tests.
It is preferable for the code under test to be load from the module. For new code which is not yet merged into the module, with new tests, the code and any dependencies not available from the module should be loaded before running the test - not as part of the test (this an exception to point 1 - which is a should not a must). Note (see #313) that a function loaded outside the module needs any classes to be in the module's PSM1 file and the module to be loaded with Using Module, not Import-module
Because, by default, Mocks apply to the scope of the pester script, (1) requires the use of InModuleScope in any test which mocks something inside the module.

DarqueWarrior commented 4 years ago

Can you take a look at the PR for this. I have all 779 test ported and running correctly. They all complete a minute faster than before. Every unit and integration test was re-written to match Pester 5 requirements. I even found a few Pester 5 bugs while doing so. But as this point everything is working as expected.

DarqueWarrior commented 4 years ago

I would encourage contributions that also add integration tests. I don't want to load the entire module to run unit tests. I have already gotten all the tests running even those with completers and validators. I want each contributor to really think about the tests they write and what classes and functions are being tested. Loading the entire module feels like a shortcut. I want to test the code in isolation.

I run the integration tests before we release a new version and that does load the module. We need to do a better job of adding the required tests to catch the issues your raise with integration tests.

DarqueWarrior commented 4 years ago

As a general guiding principle, unit tests should be insensitive to implementation details in dependencies of the code being tested.

This is true and this is why we Mock. I think many of the original tests Mock way to close to the wire. I think mocking the call to _callAPI and making sure all the parameters are set correctly should be as low as a new function should have to go. Mocking invoke-webrequest is too low. That was done originally to make sure the URLs were being formed correctly. However, that should be tested in unit tests for _callAPI and not in ever function that calls it.

DarqueWarrior commented 4 years ago

InModuleScope caused a lot of pain in the past which is why I worked so hard recently to remove all references to it. Using InModuleScope gives the tests access to the inner workings of the module and really opens you up to contributors being exposed to all the implementation details of the module. Not use InModuleScope forces the tests to test the public interface.

DarqueWarrior commented 4 years ago

Then the developer of the name-validator decides to add caching support, using another class named "ThingCache" and anything which validates the name has an indirect dependency on "ThingCache" requiring all their tests to load an additional file. The developer probably will not (should not ?) change the tests which validate the work of others.

What happens is the tests fails with an error stating could not find type "ThingCache". I saw tons of it when I was refactoring all the tests. This is not something you can forget because the tests will fail. Not to mention to make sure your tests run as expected you need to Mock your "ThingCache" otherwise your tests will have side effects. By mocking it you control if there should be anything returned form the cache or not. If you don't mock it you run the risk of flaky tests depending on the order they run and what is or is not in cache at the time the tests runs.

DarqueWarrior commented 4 years ago

As for the future. The next move will be taking this entire module into a binary module written in C# pester will not be used for unit tests then. Not sure when this is going to happen but I have been giving it a lot of thought.

DarqueWarrior commented 4 years ago

We can continue to discuss but I am closing this because I have already ported all the tests.

jhoneill commented 4 years ago

Can you take a look at the PR for this. I have all 779 test ported and running correctly.

Maybe I raised this a little too late. There must be 200 commits in that PR :-)

I don't want to load the entire module to run unit tests. I have already gotten all the tests running even those with completers and validators. I want each contributor to really think about the tests they write and what classes and functions are being tested. Loading the entire module feels like a shortcut. I want to test the code in isolation.

That's a valid thing to want. But when code has dependencies moves a web of complexity into the tests. Each thing being tested has to load it's dependencies, and their dependencies, and their dependencies. So A, B and C might depend on X. The contributor who provided X shouldn't need to concern themselves with who / what depends on X. The contributors of A, B, and C shouldn't be concerned with how X is implemented. But as above, a change to the X's implementation breaks the tests for A B and C. Does X's author change everyone else's tests . When I went through and "fixed" the tests so they would with the first attempt at the completers and validators, @SebastianSchuetze came back and asked me to please to leave the existing tests alone. And he's also said he really doesn't like PRs submitted which cause tests to fail. Which combine to mean not changing anything that anything else depends on.

Loading the dependencies in the same way that they will be loaded in real use doesn't make the test any less meaningful, and reduces work, "shortcut" implies that it is cheating to get rid of that work. I don't think it is.

As a general guiding principle, unit tests should be insensitive to implementation details in dependencies of the code being tested.

This is true and this is why we Mock. I think many of the original tests Mock way to close to the wire. I think mocking the call to _callAPI and making sure all the parameters are set correctly should be as low as a new function should have to go. Mocking invoke-webrequest is too low. That was done originally to make sure the URLs were being formed correctly. However, that should be tested in unit tests for _callAPI and not in ever function that calls it.

Mocking makes a test dependent on the implementation. For example if you call Get-VsTeamProcess in your function and your test mocks Get-VsTeamProcess - great. If the test mocks _callApi to spoof a response to Get-VSTeamProcess (which they did) then changes in the dependency might break your test - which I think that's what you're saying. It's OK for a test to rely on things being a given way in the code-under-test but not in the dependencies.

Using InModuleScope gives the tests access to the inner workings of the module and really opens you up to contributors being exposed to all the implementation details of the module. Not use InModuleScope forces the tests to test the public interface.

Except the tests don't stick to public interfaces - they load (and mock) non public parts that they and their dependencies depend on.

Then the developer of the name-validator decides to add caching support, using another class named "ThingCache" and anything which validates the name has an indirect dependency on "ThingCache" requiring all their tests to load an additional file. The developer probably will not (should not ?) change the tests which validate the work of others.

What happens is the tests fails with an error stating could not find type "ThingCache". I saw tons of it when I was refactoring all the tests. This is not something you can forget because the tests will fail. Not to mention to make sure your tests run as expected you need to Mock your "ThingCache" otherwise your tests will have side effects. By mocking it you control if there should be anything returned form the cache or not. If you don't mock it you run the risk of flaky tests depending on the order they run and what is or is not in cache at the time the tests runs.

and my point is you shouldn't have seen tons of it. Loads of tests fail, but the does the dependency author change how tests work for things which depend on their code ? After being told not to change tests so my code passes, I'm probably going to leave those tests broken. I'll admit I haven't mocked classes up to this point, but I've seen you've taken code out of a cache class to put it in a function which can then be loaded from common.ps1 and mocked in a test. Instead of the test resetting the cache it mocks _hasProcessTemplateCacheExpired which is a dependency of VSTeamProcessCache which is a dependency of ProcessValidateAttribute which is a dependency of two dozen other functions each with their own tests. (Incidentally that mock means the data served by the cache is always 'server-fresh' if the cache serves fresh data fine but corrupts what it caches for next time, the mock prevents that being found).
So someone who writes something which has a ProcessTemplate Parameter can't just copy the parameter declaration from another function: in order to write a unit test they need to know where the source code for attribute class is, that the attribute class uses a cache class, and the cache class calls private functions and where those functions are. But the whole point of a class is one should be able to use it without knowing how it was implemented !

smurawski commented 4 years ago

There are a couple of things going on here:

We are taking an approach that most commonly is used with object-oriented software and applying it to a hybrid functional/object-based environment. This changes how we approach the things that we mock, stub, or isolate.
Classes have inconsistent, evolving behavior over the supported (by this module) versions of PowerShell.
Classes have inconsistent state behavior inside of Pester tests.

The most consistent way to address that was reduce the number of loaded dependencies in each test and remove inmodulescope.

Regardless of theory or perceived best practice - practicality determined the current approach with tests. Unfortunately, module and inmodulescope were very leaky abstractions when mixed with classes.

Pester 5 may improve that state somewhat, but there is a balance between effort to restructure towards an ideal and maintaining velocity on a community project. Whether or not an approach is "better" depends on the weighing of risks vs rewards. Would the time and effort to restructure the tests improve the state and quality of the module? Would it improve the stability of the pipeline? And would it ease the ability to contribute to the module? I'm not sold that it would move any of those goals in a appreciable amount.

The approach to testing here is not unheard of and is very common in Ruby projects (which shares similar runtime interpretation). Tests load minimal dependencies and parts of modules or projects to test just the behaviors in the classes and functions defined in the focus of the test. It can be fragile when making large changes to dependent code, but running the full test suite will expose where those connections are.

There is a gap exposed by the current testing structure as mentioned by James - that there is no guarantee that all the dependencies will be present or included in the module and that each test is responsible for maintaining any of the dependencies. It is an intentional choice and I think it is a reasonable one based on other projects I have participated it.

Functional and integration tests can help mitigate that exposure, exercising the functions in a controlled fashion - ensuring successful module load, command resolution, and command behavior is as expected.

I can't talk to anyone's code reviews of PRs and what type of test modification was recommended or discouraged, so that's for others to weigh in on.

DarqueWarrior commented 4 years ago

I have been reviewing the integration tests today and have already started adding more to address the concern of module loading correctly. I also changed the integration tests so they are not longer destructive and can be run safely on anyone's account.

I do believe many of the existing tests mock at a level lower than is desired. I will work to up level the mocks to _callAPI. I will also increase the code coverage of _callAPI which is probably the most important function of the entire module. With tests that can guarantee a high level of confidence in _callAPI contributors only need to make sure they pass the correct parameters to _callAPI and not have to mock invoke-restmethod.

jhoneill commented 4 years ago

@smurawski thanks for the insight. It maybe that what's desirable simply isn't possible with Pester and what inmodulescope is able to do as it stands. I have said multiple times the way this module has historically loaded its classes can be problematic, and I wonder if that is contributing to the problem you see with Pester and classes..

I can't get past roughly 40 years of believing that the writer of a subroutine, function, class or any other self contained piece of code (and from assembly upwards) should not need to know who/what uses their code, and a user should not need to know how the code is written. Both know only what results should come out for given inputs. That also means if a test gives different results for two sets of code which give the same output for the same input is not valid as a functional test (it might be a performance or other non-functional one).

Here, someone who writes a test for X must know all of X's dependencies, and their child dependencies and so on. And if the author of a dependency modifies it, every test which calls that dependency may need to be change. User and writer can no longer work independently and two implementations of the same function which produce the same output result in working or broken tests. Implementation changes break tests that hard code things like "Only makes one call to Invoke-RestMethod" which reflects a combination of what has been mocked and the internals of the implementation. (For example 'Always say cache has expired' as a mock causes extra fetches of data, breaking the count)

Changes one might make (for example moving private functions into or out of common.ps1) are rendered impractical by the test breakage which would result. The many private "underscore" functions which have been created for mocking mean one cannot read a function from top to bottom and know what it does, and trying to unpick problems with the tests, or things done because of the way the tests takes time, quite a lot of time. Tests which produce "false failures" soon stop being used to help code quality. And to go back to where I started, I think that a shift on the approach to tests would make everything wonderful :-) Just because I believe that, with the limited facts I have, doesn't automatically make it true.

smurawski commented 4 years ago

This is one of those situations where there are multiple, diverging "good" routes. Unfortunately, abstractions and implementations are often less than perfection.

I'll admit the classes behavior in PowerShell does not make this easier (one reason I'm not a fan). Outside of the class-loading scenario, most of the dependency management for a feature will be loading common.ps1 (which wraps most of the shared behavior) and maybe one or two other files.

Could we use some helpers to ease the dependency loading for classes in the tests? Maybe. It adds another point of maintenance.

Should it be that all tests require the module to be compiled before each test? I don't think so, and that pattern isn't definitive across languages. Two that I've worked in a reasonable amount - Ruby and Rust, have patterns in their communities around testing that requires explicit loading of dependencies per test file. That doesn't make it the gold standard, but it does mean this project is doing something untoward.

What makes this all work a bit easier is focusing on making small changes. Small changes allow you to move the project in bits that are easier to comprehend and rationalize and quicker to merge with fewer potential test impacts.

I'll resist trying to do a point by point argument, because I think we both have valid considerations. At the end of the day, it's Donovan's project and he gets to set the direction to what he feels best achieves the project goals.

jhoneill commented 4 years ago

This is one of those situations where there are multiple, diverging "good" routes. Unfortunately, abstractions and implementations are often less than perfection.

Very true. I'm also fond of saying that the best position from which to to criticize what has been done to date is one of ignorance. I have stumbled into contributing to this project, and know very little about what has been done and why, and it's very easy to rock up and say "You should be doing this in such-and-such a way and it would fix X that I see as a problem", to which a valid reply is, "Yes, that does fix X, but it has a side effect of Y and Z and we ended but doing it the way that we are because it it is the least bad compromise". What is actually done is always a compromise, and one needs to tread careful when criticising imperfections.

I'll admit the classes behavior in PowerShell does not make this easier (one reason I'm not a fan). Outside of the class-loading scenario, most of the dependency management for a feature will be loading common.ps1 (which wraps most of the shared behavior) and maybe one or two other files.

I should be common + direct dependencies, the issues come when loading one thing brings a complex and shifting web of dependencies (which is often the case with classes)

Could we use some helpers to ease the dependency loading for classes in the tests? Maybe. It adds another point of maintenance.

I was thinking about that. If every PS1 file loaded its direct dependencies then the test would only load the file under test (which would then load the rest). "Compiling" the module would strip these loads out (much as the "using namespace") commands are stripped and merged now. That might be better but I haven't thought all the details. It would address my strongest objection, it means the user says "I use this class / function" and doesn't need to know what it uses and the author of the class/function says "I use these other things" and changing their code doesn't require changes to other people's test

Should it be that all tests require the module to be compiled before each test? I don't think so, and that pattern isn't definitive across languages. Two that I've worked in a reasonable amount - Ruby and Rust, have patterns in their communities around testing that requires explicit loading of dependencies per test file. That doesn't make it the gold standard, but it does mean this project is doing something untoward.

My first reaction was "yes it should..." and I think that instinct is wrong. Digging in, I think I view loading the module as the only way to get the dependencies in a way that is consistent, reliable and simple.
If function "A" calls function "B" it is OK to say "Before testing A load B".
What isn't OK in my view is to say, "B now has an argument completer, X, so before testing A load B and B's completer; X calls function C, so load that too, and C also has validators and completers Y and Z so load them etc." Now A's test's author has to load A,B & C and X,Y and Z Then I make a change to C's validator "Y" to use Class Q I wrote the validator for C so I will ensure that it passes its tests. Everything upstream from A is working, but the A's and B's tests are not, so who fixes the tests ?

What makes this all work a bit easier is focusing on making small changes. Small changes allow you to move the project in bits that are easier to comprehend and rationalize and quicker to merge with fewer potential test impacts.

See above. My frustration is that I make a small change and potentially break many tests. As a rough order-of-magnitude thing one hour's dev work entails a a day spent repairing tests.

I'll resist trying to do a point by point argument, because I think we both have valid considerations. At the end of the day, it's Donovan's project and he gets to set the direction to what he feels best achieves the project goals.

Sorry for going point-by-point :-) I'm aware that I don't have a monopoly on the right way to do stuff, and I do change my mind on which way is preferable in the light of new facts. One of things that comes with owning a project is that these things are your call, and anyone might think they have a better idea - you probably want to hear those ideas, even though many will need you to explain why the status quo is better. I have a healthy respect for those who have put a lot of time into this project before I showed up, - their attitude to me as a contributor has been great, so I hope that respect does come through even when I am making the case for doing things differently.

jhoneill commented 4 years ago

One last thing on Mocks

@DarqueWarrior said

I do believe many of the existing tests mock at a level lower than is desired. I will work to up level the mocks to _callAPI. I will also increase the code coverage of _callAPI which is probably the most important function of the entire module. With tests that can guarantee a high level of confidence in _callAPI contributors only need to make sure they pass the correct parameters to _callAPI and not have to mock invoke-restmethod.

I've lost the last two hours to chasing down a problem which looks like a pester bug. if you have

function Get-VSTeamProcess {
   [CmdletBinding(DefaultParameterSetName = 'List')]
   [OutputType([vsteamprocess])]
   param()
}

It appears that trying to mock the command fails because of a scope issue: the mock is trying to work one scope up from where test runs and loads the class. The result is pester throws an error which reads something like "Can't convert Get-VSteamProcess to a command ... can't find [vsteamProcess]

However this poses a question, what should be be mocked. Should we

Replace the parsed JSON coming back from the server at the Invoke-RestMethod level or...
Replace the objects which come back from _callAPI or...
Replace the Processes which come back from Get-VSteamProcess ?

(1) Replaces the call to the server which is the thing we can't do in a test, so there is a case for mocking IRM. But I agree with what was said above, the case against doing that is stronger. The person calling _callApi should not worry about the URL which gets built or the request body sent to the server. They request this area, that resource etc and what goes on a level below that should be of no interest. (3) This scope issue pushes towards "no". However that requires the person calling Get-VsTeamProcess to know what requests it makes, which is generally a bad thing, and if there is a change to internals of how that function works (which is what I'm doing) a mock for _CallAPIwith the parameters Get-Process originally passed no longer works. And we end up back with my original problem that a change in one place breaks an indirectly related test.

So something I would put forward as an ideal, a guideline, not a hard-and-fast law, is that mocks should be for things called directly by the code being tested, not for things which are called by the called code. So yes, mock _callAPIfor the calls in the code being tested, but not for calls to it in the dependencies. (The specific above is gets an exception !) Mocking the whole dependency is probably a better unit test approach anyway. I don't think there is really an option to do a wholesale change to this, but it's something to try to do as tests develop.

Opinions very welcome :-)