UnkindPartition / tasty

Modern and extensible testing framework for Haskell
638 stars 108 forks source link

Add exact dependency matching #343

Closed martijnbastiaan closed 1 year ago

martijnbastiaan commented 2 years ago

Tasty 1.2 introduced a way for tests to specify dependencies. That is, what tests should run before themselves. These dependencies are specified using an AWK-like expression annotated to a TestTree. These expressions can match against any test in the full TestTree. This approach has a few rough edges:

This commit introduces the ability to specify dependencies by using a combinator taking a list of TestTrees. This way of specifying dependencies removes the quadratic complexity in favor of a linear one, while eliminating the ability to accidentally introduce cycles or unintended sequentiality.


A bit of history for this PR as it has undergone several rewrites thanks to a few great reviews I've gotten:

martijnbastiaan commented 2 years ago

@Bodigrim @VictorCMiraldo I'm eager to get this PR merged, my aim is to fix all of the gotchas listed in Tasty's README regarding dependencies - these affect the main project I'm working on:

  1. If Test B depends on Test A, remember that either of them may be filtered out using the --pattern option. Collecting the dependency info happens after filtering. Therefore, if Test A is filtered out, Test B will run unconditionally, and if Test B is filtered out, it simply won't run.
  2. Tasty does not currently check whether the pattern in a dependency matches anything at all, so make sure your patterns are correct and do not contain typos. Fortunately, misspecified dependencies usually lead to test failures and so can be detected that way.
  3. Dependencies shouldn't form a cycle, otherwise Tasty with fail with the message "Test dependencies have cycles." A common cause of this is a test matching its own dependency pattern.
  4. Using dependencies may introduce quadratic complexity. Specifically, resolving dependencies is O(number_of_tests × number_of_dependencies), since each pattern has to be matched against each test name. As a guideline, if you have up to 1000 tests, the overhead will be negligible, but if you have thousands of tests or more, then you probably shouldn't have more than a few dependencies.

    Additionally, it is recommended that the dependencies follow the natural order of tests, i.e. that the later tests in the test tree depend on the earlier ones and not vice versa. If the execution order mandated by the dependencies is sufficiently different from the natural order of tests in the test tree, searching for the next test to execute may also have an overhead quadratic in the number of tests.

This PR together with a followup PR should achieve that. I know it's quite a bit to review, so I've tried to add enough tests to hopefully convince you that it's at least somewhat sound. If you'd like a video call to talk this through, I'm open for that too! If there's anything else I can do, ask away.

Bodigrim commented 2 years ago

@martijnbastiaan sorry, I don't have bandwidth for this right now. Could you possibly summarise all breaking changes here (both API and behaviour)?

CC @VictorCMiraldo @adamgundry @andreasabel as other maintainers.

martijnbastiaan commented 2 years ago

No worries @Bodigrim, thanks for the reply.

The API changes are as follows:

There are no behavioral changes to existing programs.

VictorCMiraldo commented 2 years ago

Thanks for the PR @martijnbastiaan! That is a large one! :) Unfortunately, I don't have the bandwidth nor the familiarity with the mechanisms surrounding tasty's test dependencies. Hence, I'm not sure I can give a good review on this anytime soon.

martijnbastiaan commented 2 years ago

Understood!

Hmm. So to give some background: I wrote this patch to resolve some problems we face with our 4K large testsuite over at clash-lang/clash-compiler, a project me and my colleagues at QBayLogic maintain.

What I could do is ask a colleague for a review on this patch (and the follow-up one). Would that be enough -together with the listed API changes I posted earlier- to accept this patch? I'm of course willing to maintain the feature too, and deal with the fallout if any.

Bodigrim commented 2 years ago

TreeFold.foldSingle takes an extra argument ExactPath.

Can this breaking change be avoided by introducing a new fold and expressing existing foldSingle via it?

martijnbastiaan commented 2 years ago

Yes, we could add foldSingleExact that will (by default) simply call foldSingle.

Bodigrim commented 2 years ago

@martijnbastiaan could you please share what kind of dependency graph is typical for Clash test suite? Is it chains of depending tests?

martijnbastiaan commented 2 years ago

Sure. To give a bit of context: Clash is a Haskell to silicon compiler - meaning Haskell can be used to build chips that could be synthesized by say a TSMC (or more realistically; it's used to program a reconfigurable chip). Silicon is typically written in a Hardware Description Language (HDL) which is what Clash targets as a compiler output. There are currently three major HDLs: Verilog, SystemVerilog, and VHDL. Each of these languages have various simulators that can be used to predict chip behavior before actually burning it into silicon. Clash itself can also be simulate (simply by leveraging GHC!).

So, one test looks something like:

  graph TD;
      FooTest-->VHDL;
      FooTest-->Verilog;
      FooTest-->SystemVerilog;
      VHDL-->Clash_VHDL;
      VHDL-->Clash_Sim_VHDL;
      Verilog-->Clash_Verilog;
      Verilog-->Clash_Sim_Verilog;
      SystemVerilog-->Clash_SV;
      SystemVerilog-->Clash_Sim_SV;
      Clash_SV-->ModelSim;
      Clash_SV-->Verilator;
      Clash_VHDL-->GHDL;
      Clash_VHDL-->Vivado_VDHL;
      Clash_Verilog-->Vivado_Verilog;
      Clash_Verilog-->IcarusVerilog;

I'm glossing over some things, but each new (regression) test looks a bit like this. This makes the number of dependencies grow linearly with the number of tests, hence us running into this caveat mentioned in the README:

Using dependencies may introduce quadratic complexity. Specifically, resolving dependencies is O(number_of_tests × number_of_dependencies), since each pattern has to be matched against each test name. As a guideline, if you have up to 1000 tests, the overhead will be negligible, but if you have thousands of tests or more, then you probably shouldn't have more than a few dependencies.

With regards to:

Changing TreeFold is likely to break all third-party tasty ingredients.

I'm willing to help out patching them, of course.

Bodigrim commented 2 years ago

I agree that the infrastructure for dependencies between tests is wanting, but I feel that your approach covers only a fraction of cases, but required changes are quite profound. My use case is dependencies between disjoint, potentially remote trees, see https://hackage.haskell.org/package/tasty-bench-0.3.2/candidate/docs/Test-Tasty-Bench.html#g:4

If the shape of your dependency graphs is relatively stable, might your use case be better served with a custom IsTest provider?

martijnbastiaan commented 2 years ago

The only real limitation is that you can't make a DAG, you can only make a tree. I've checked out the example of bcompare and the examples it links (chimera, fast-digits, unicode-data). Correct me if I'm wrong, but it seems like none of those examples need the TestTree to be a DAG. I think all of these examples just want to run a single test as a baseline (left part of afterTree) and compare it against a bunch of other tests (right part of afterTree). I'll respond to the other points too, but I'd like to understand the need for a DAG better before I do.

Bodigrim commented 2 years ago

Correct, none of my examples needs a DAG. However, the shape of TestTree is not necessarily in correspondence with a dependency tree.

martijnbastiaan commented 2 years ago

Sure, that's true. I'll be honest, it's kinda hard to respond to this without sounding very negative/argumentative. It seems to me that you're paying a pretty high price for the ability to do so. The examples need to construct patterns such as:

"$NF == \"" ++ show n ++ "\" && $(NF-1) == \"Chimera\""

..which would break if there's ever another test group called Chimera anywhere in the tree, quite seriously contradicting the composability of TestTrees. It has all the downsides mentioned in the README. Last but not least, for a casual reader it's pretty hard to even read the code - given that you need to do the same process Tasty does when finding a dependency: scan the whole tree.

So all in all, it seems somewhat "obvious" to me that you'd want something like afterTree. And as far as the complexity goes.. well that's kinda imposed to us by the existence of after itself. If we only had afterTree, there would have been no need for Trie or DependencyTree (from the follow-up PR). All would naturally follow from the trees construction and traversal - the code would be even simple than what's now in main.

So, in a way I feel that this remark:

but I feel that your approach covers only a fraction of cases

turns everything on its head. It seems to me that after should try and justify its costs. Is it really worth having this pattern matching when it messes with composability, computational complexity, correct-by-constructionness, and code complexity?

I guess what I'm asking is: is there any chance of this getting merged? If not, do you see an alternative way of tackling the issues in the README, or will Tasty be forever stuck with them?

Bodigrim commented 2 years ago

@martijnbastiaan to be very clear, I agree that the dependency design is inconvenient and I also agree that if we were doing it from the scratch and with our current experience, then AfterTree would be a saner option. My problem is that we already have a mechanism (constructor After) for this pretty niche feature, and adding another one (constructor AfterTree) for even more niche use cases does not strike me as the most desirable outcome. I'm happy to be overruled by other maintainers, my cautiousness is because I'm a caretaker only.

So, answering your question directly, yes, there is a chance, but we must consider the design and long-term consequences. This can take more time than one might like, sorry for that. I do appreciate your work, your efforts and this particular contribution.

What is the prior art for dependencies between tests in other frameworks?


Could you define something like

data ClashTest = ClashTest 
  { ghdlTest :: Assertion
  , vivadoVdhlTest :: Assertion
  , vivadoVerilogTest :: Maybe Assertion
  , icarusVerilogTest :: [Assertion]
  , ...
  }

and provide instance IsTest ClashTest to manage dependency chain? More generally, you can define newtype AssertionChain = AssertionChain [Assertion] with instance IsTest AssertionChain executing tests in order.

martijnbastiaan commented 2 years ago

Understood. If necessary, I'd be willing to maintain it. My ideal would be to deprecate After in the very long term, but I understand that takes a lot of time, effort, and commitment - hence not always worth it.

I'll have a look at your proposal and dig into the prior art, hopefully this weekend.

martijnbastiaan commented 2 years ago

What is the prior art for dependencies between tests in other frameworks?

I've looked into this:

To me it looks like sydtest is very close to doing TheRightThing(tm).

Could you define something like

This can work, but it has quite a few drawbacks by virtue of the tests being opaque to Tasty:

So all in all this would be a pretty big step back IMO.


I could do two things to make this series of patches more acceptable:

-- | An algebra for folding a `TestTree`.
--
-- Instead of constructing fresh records, build upon `trivialFold`
-- instead. This way your code won't break when new nodes/fields are
-- indroduced.
data TreeFold b = TreeFold
  { foldSingle :: forall t . IsTest t => OptionSet -> TestName -> t -> b
  , foldGroup :: OptionSet -> TestName -> b -> b
  , foldResource :: forall a . OptionSet -> ResourceSpec a -> (IO a -> b) -> b
  , foldAfter :: OptionSet -> DependencyType -> Expr -> b -> b

    -- "Exact" versions of functions defined above. In 'trivialFold', these simply
    -- call the non-exact versions. These functions are here for backwards compatiblity
    -- reasons.
    --
    -- Note: instead of taking a `TreeFold`, these functions could take their non-exact
    -- function directly, but this makes the type signatures quite long.
  , exactFoldSingle :: forall t . IsTest t => TreeFold b -> ExactPath -> OptionSet -> TestName -> t -> b
  , exactFoldGroup :: TreeFold b -> ExactPath -> OptionSet -> TestName -> b -> b
  , exactFoldResource :: forall a . TreeFold b -> ExactPath -> OptionSet -> ResourceSpec a -> (IO a -> b) -> b
  , exactFoldAfter :: TreeFold b -> ExactPath -> OptionSet -> DependencyType -> Expr -> b -> b
  }

I personally don't think it's worth it, but I'm happy to do it if that makes things more acceptable/reviewable.


I do appreciate your work, your efforts and this particular contribution.

Thanks btw, happy to work with you guys.

Bodigrim commented 2 years ago

To me it looks like sydtest is very close to doing TheRightThing(tm).

Would sequential combinator on its own be sufficient to match your requirements?

martijnbastiaan commented 2 years ago

For my specific use case it would be both a step forward and backward at the same time.

The positives:

The neutrals:

The negatives:


I'm somewhat biased as I already put in the work of course, but it strikes me as a band-aid instead of a proper solution. Do you think it would be easier to implement?

Bodigrim commented 2 years ago

Well, adding TestTreeSequential constructor or maybe extending existing TextTree with a switch Parallel | Sequential seems to me a better option. You do not lose ability to run certain parts in parallel, because TestTreeSequential can affect only one level of tree.

martijnbastiaan commented 2 years ago

Sure, I can amend the patch to do this.

Bodigrim commented 2 years ago

@martijnbastiaan before you do more work, I'd like to hear from other maintainers.

@VictorCMiraldo @andreasabel Existing TestGroup constructor groups subtrees for (potentially) parallel execution. The proposal is either to add TestGroupSequential constructor, or to extend TestGroup with a field data Parallelism = Parallel | Sequential, which mandates parallel execution of subtrees. How do you feel about this? Currently a similar (but not quite the same) effect can be achieved with a sequence of After constructors, which is very tedious to write and unbearably slow to execute.

Edited: TestTree TestGroup.

VictorCMiraldo commented 2 years ago

@Bodigrim and @martijnbastiaan, first of all, thanks for the great work both making the PR and the thoughtful comments! :)

Currently a similar (but not quite the same) effect can be achieved with a sequence of After constructors, which is very tedious to write and unbearably slow to execute.

Yeah, that doesn't sound like a very good solution and I can see how running tests sequentially is an interesting feature. I do tend to agree with @Bodigrim, however. Adding a AfterTree constructor feels weird given that we already have an After. Considering the semantic interplay between them seems like a potential nightmare for me. Can we have weird deadlocks with things like (in pseudo-tasty) t = AfterTree (After "/a/" $ singleTest "b") (singleTest "a")? IIUC, plain tasty already has this type of problem with regular TestTrees.

It does feel more natural to to add a TestTreeSequential constructor instead of using the dependency mechanism to enforce sequential execution. Also, it doesn't introduce new quirks w.r.t. dependencies such as t above: a TestTree u has a deadlock iff TestTreeSequential u has a deadlock.

Would a TestTreeSequential need this change? If not, then it would be fully backwards compatible, which is a huge plus IMO

Bodigrim commented 2 years ago

Potentially we can leave TreeFold as is, with foldGroup processing both TestGroup and TestGroupSequential. Adding a new constructor to TestTree is breaking any way, but potentially with less impact on clients.

martijnbastiaan commented 2 years ago

Fundamentally, TestTreeSequential will differ very little from AfterTree. Instead of having two TestTree fields, it will simply take a list of them. I do think it's the better option: I only ended up using sequentialTestGroup (a function this PR introduces) in practice. However, it will still need the changes to TreeFold, unless I find a better way to implement it.

The core of the problem is that this variable:

https://github.com/UnkindPartition/tasty/blob/df7ccab50d361c7d1a3c00960faac883d98c2530/core/Test/Tasty/Run.hs#L327-L332

needs to take TestTreeSequential (or AfterTree) into account. While patterns need to scan the whole tree there, AfterTree/TestTreeSequential can define much more efficient lookup strategies.

martijnbastiaan commented 2 years ago

Hmm.. An alternative implementation could implement a custom folder that propagates dependencies up while it is folding. I.e., it would not use foldTestTree to run the tests. This would avoid the complexities of introducing the Trie and would also make for a natural solution to the very first caveat mentioned in the README. I'm not sure how this would fit into the rest of the ecosystem though.

Bodigrim commented 2 years ago

I would do this (please find a better name for TestGroup'):

data ExecutionOrder = Concurrent | Sequential

data TestTree
  = forall t . IsTest t => SingleTest TestName t
  | TestGroup' ExecutionOrder TestName [TestTree]
  | PlusTestOptions (OptionSet -> OptionSet) TestTree
  | forall a . WithResource (ResourceSpec a) (IO a -> TestTree)
  | AskOptions (OptionSet -> TestTree)
  | After DependencyType Expr TestTree

pattern TestGroup = TestGroup' Concurrent

This is a breaking change, but a very minor one. And you can leave TreeFold as is, ignoring ExecutionOrder, so there is no breakage here.

Indeed, createTestActions can fold TestTree by its own means, not necessarily via foldTestTree.

VictorCMiraldo commented 2 years ago

FWIW, I like to compromise suggested by @Bodigrim above. Seems like a good option to minimize breakage while enabling @martijnbastiaan to have the necessary feature that motivated the PR in the first place.

martijnbastiaan commented 2 years ago

Great, thank you both. I've got a patch now that implements the suggestions (adding TestGroup field + custom traversal). While it doesn't try to fix any of the issues raised in the README for pattern dependencies, it does solve them for the new way of specifying deps in a fairly straightforward way.

I'm pretty happy with the patch, but I'm still working on writing a proper testsuite. I'm spending only a few hours on it every weekend so it'll probably take a while..

martijnbastiaan commented 2 years ago

I've just pushed an alternative implementation based on the comments above. This one doesn't try to fix any issues with dependencies specified using patterns, but does solve them all for dependencies specified using sequentialTestGroup. TestGroup is now the primitive setting parallelization/sequentiality. Two backwards incompatible changes have been applied:


1. TreeFold no longer has a field foldResource. Its type signature had to change in order for the fold to account for dependencies while filtering. However, it doesn't seem likely downstream will be affected at all:

  1. Searching for foldResource on all of GitHub only yields tasty using it.
  2. Searching through all libraries with tasty in their name on Hackage does not yield any relevant usages of TreeFold either.

If this change proves controversial, I suggest we move the second commit in this PR to a separate PR and discuss there.


2. TestGroup has an additional field ExecutionMode. @Bodigrim, you suggested making a pattern synonym like this:

pattern TestGroup = TestGroup' Concurrent

But, libraries using this pattern will silently fail:

case tree of
  TestGroup _ -> ...
  _           -> ...

Libraries using the following pattern will fail with just a GHC warning:

case tree of
  TestGroup _ -> ..
  [.. exhaustive list of constructors ..]

To get a feel for the breakage, I've cloned all the libraries published on Hackage related to tasty and searched for TestGroup. Of all these libraries, 7 use TestGroup directly: tasty-bench, tasty-expected-failure, tasty-silver, tasty-inspection-testing, tasty-focus, tasty-focus, and tasty-bdd. Of these, 6 would fail silently as their usage falls in one of the two patterns mentioned. Only tasty-inspection-testing would continue to be correct.

I'm proposing to let static typing do its thing to prevent silent breakage.

martijnbastiaan commented 2 years ago

@Bodigrim I've applied your suggestions in separate commits. I'll squash them into the first two if you're happy with them.

martijnbastiaan commented 1 year ago

@Bodigrim Again thanks for all the comments, and sorry for the delay on this one. There are two unresolved comments now. I've been unable to come up with a strategy to join foldTestTree and the fold in createTestActions - I couldn't find an implementation/type for foldResource that worked out. As discussed before, I don't think anyone but Tasty itself uses this field, so I don't think this is a major issue.

martijnbastiaan commented 1 year ago

@VictorCMiraldo @Bodigrim Anything I could do to get this PR going again? :)

Bodigrim commented 1 year ago

@martijnbastiaan I'm sorry, I've lost all the context since August 2022 and I don't have time to restore it at the moment. I'll try to review in upcoming weeks.

martijnbastiaan commented 1 year ago

Yeah no worries, my fault for being a slowpoke. I'll brush up the cover letter with an overview of the history of this PR today.

Bodigrim commented 1 year ago

@martijnbastiaan could you please rebase?

martijnbastiaan commented 1 year ago

I managed to make it work without the custom traversal in Run.sh and with only one very minor API change. I need some more time to brush it up for review. I hope to publish that this weekend.

martijnbastiaan commented 1 year ago

Thanks @Bodigrim. Regarding the commit:

Refactor 'createTestActions'

I found the stack of Traversal/Reader/Writer/IO + monoidal combination to be very cumbersome to work with. Given that it also had some complexity issue I decided to refactor it into the current structure (see the commit message for more information). Overall I'd say the current approach is an improvement, but if there's disagreement I'll try my best to make it work in the original structure.

If it helps; we could also split this PR up into multiple (cleanup + foldGroup API change + refactor createTestActions + this).

Bodigrim commented 1 year ago

Let's pull out refactoring to its own PR please.

martijnbastiaan commented 1 year ago

Okay, done: https://github.com/UnkindPartition/tasty/pull/364.

martijnbastiaan commented 1 year ago

@Bodigrim All done :)

martijnbastiaan commented 1 year ago

@martijnbastiaan I'm truly sorry that it takes so long, I'm very much overloaded recently.

No worries, I appreciate the feedback. If it would be appreciated: I could help out reviewing PRs for this project and doing some general maintenance.

Bodigrim commented 1 year ago

Thanks for persistence, @martijnbastiaan :)

tasty is a fairly stable package, so before we cut a new major release, I'd like to resolve as many other issues as possible. #341 and #311 are probably closest to completion, and #231 and #342 seem actionable too.

Bodigrim commented 1 year ago

If it would be appreciated: I could help out reviewing PRs for this project and doing some general maintenance.

@martijnbastiaan much appreciated! Sorry, I don't have admin rights to grant you formal privileges (and I'm reluctant to distract @UnkindPartition by asking), but you are most welcome.

martijnbastiaan commented 1 year ago

I read about his situation, best not to distract him yeah. I'll see what I can do for the listed issues/PRs.