czan / stateful-check

Stateful generative testing in clojure
MIT License
117 stars 11 forks source link

Emacs Mode #14

Open r0man opened 1 year ago

r0man commented 1 year ago

Hi @czan,

I'm experimenting with an Emacs Mode for stateful-check.

Right now I have the following features planned:

It's all work in progress, but this is where I am right now:

https://github.com/r0man/stateful-check/tree/nrepl https://github.com/r0man/cider/tree/stateful-check https://github.com/r0man/cider-nrepl/tree/stateful-check

Since you also seem to use know a bit about Emacs and NREPL middleware (I just discovered wingman), do you have any other ideas?

Roman

czan commented 1 year ago

Hi Roman, this sounds like a great idea!

Initial thoughts: having to maintain things over multiple repositories will be a bit of a pain. Having a single .el file in the stateful-check repo, and then patching the Clojure-side nrepl bits would be my preference. These must be optional, so that stateful-check continues to function without CIDER (particularly because keeping it up-to-date with CIDER changes might be pretty annoying, and we want things to remain minimally functional at all times).

I would focus on one thing at a time. I think a good place to start is to aim for "render the command trace Emacs-side, using CIDER to print objects". This will require (at least):

I think this is solving the core pieces of the problem (i.e. moving data between both sides, and rendering it).

There are some parts to this that not obvious. For instance, your change in https://github.com/r0man/stateful-check/commit/ee44cc06070196256e5d4b3f549f71e1fdefe283 is tricky, because if a mutation is detected the object will be returned as a string (with a warning) rather than the object itself. If we want the command/trace objects to be inspectable in Emacs, we need to decide how this interacts with mutable objects in the test, and what that experience is for a user.

r0man commented 1 year ago

Hi @czan,

thanks for your suggestion. I keep them in mind going further.

About the tricky change you pointed out. Maybe we could keep both the string and the (possibly mutated) object in the report? The conversion to a string and comparing it is done to detect such a mutation. Is that right?

czan commented 1 year ago

Yeah, the runner converts objects to strings during the test run, then uses those strings to attempt to detect objects which are mutated.

This behaviour actually isn't ideal, because it takes time in the stateful-check framework which might affect how likely we are to provoke race conditions when using multiple threads. I guess it would be nice to be able to opt-out of the mutation detection when you know it's not an issue, in which case we'd want to keep the object around so we can pr-str it at the end.

Changing things so the failure exception contains the actual objects alongside the pr-str representation of them seems fine to me, in principle. stateful-check.core/report-result is possibly the only existing function that would care about the change.

r0man commented 1 year ago

Hi @czan,

I have another question about this mutation detection and making test results inspectable. The check-postcondition function returns a string at the moment, with the actual and expected values printed. I would like to get at that original data, not the printed, to show it in the Inspector.

Is this also because of the mutation detection or could we return pure data structures there, and print this instead in the reporting function?

Wdyt?

czan commented 1 year ago

Is this also because of the mutation detection or could we return pure data structures there, and print this instead in the reporting function?

It's not for mutation detection reasons. Postconditions are only checked after all the commands are executed (which doc/specification.org mentions), so they run too late to detect any mutations.

It would definitely be possible to return a data structure from check-postcondition. Looking through the code, it looks like the result of that (eventually) gets passed to stateful-check.core/print-sequence. Moving the formatting code into there (specifically, the doseq at the end of the function) might be enough to maintain the current behaviour, but give you richer data in the failure exception.

Let me know how you go!

r0man commented 1 year ago

Hi @czan,

thanks for the explanation. This is super helpful! I'm capturing the test events now as pure data and return a map from the postcondition. This data gets passed to the Clojure Test report machinery via the stateful-check failure exception. On the Cider NREPL side I grab the report from the current-report atom of the Cider Test middleware, transform and analyze it a bit and save it in the current session so the inspector can access the data.

For this to work, I made the following changes to stateful-check: https://github.com/r0man/stateful-check/commit/b6ef44368b6de82f6d1d90c12d2512b39c436b4b

Apart from this change I made some tweaks in my nrepl branch: https://github.com/r0man/stateful-check/commits/nrepl I updated test.check and added a :stateful-check key to the reported event (ideally all data would be captured in the stateful-check failure exception though).

What I have now looks like this:

Alternate image text

When I run a test with Cider, first the normal Cider Test report buffer shows up, which just displays the original test report from stateful-check as pure text. If you then move to a failing test case (with TAB) and press "c" the Stateful Check Emacs Mode kicks in. It asks the NREPL middleware to lookup the failing test report, analyzes and saves it in the NREPL session, and returns this data to the client.

In the Stateful Check Emacs Mode, command arguments, results, as well as the actual and expected values from failing test events are inspect-able. In this simple example it is a bit pointless, because you can see the values immediately, but I think it becomes more useful when the data structures are larger/nested.

There are some limitations for now:

Some related notes:

This stuff lives still in 3 repositories (minus my fork of stateful check) and I haven't done anything about it yet. My initial plan was to contribute it to Cider so it is available to users without configuring middleware. If you are interested in collaborating on the middleware and/or Emacs mode (I would love to, you seem to know some Emacs/NREPL tricks), and you would like to move this to another repository I am open to any suggestions. For the changes to stateful-check, I'm not sure how we want to proceed. Eventually I would love to integrate this into your repository, but since this is all in flux I guess I continue in my branch for now.

As a next step, I'm trying to make the command handles (or the state of the model after each command) inspectable. I'm just not sure yet where to grab that state from. From the runners? Or similar to what the failure-message does, step though each command and capture the state, with setup and teardown methods run. Do you have any hints?

Apart from that, it would be cool if the failure exception would contain as much data as possible. Things I would like to add there are:

Thanks for your help so far on this.

Wdyt?

czan commented 1 year ago

That all sounds great! It might be worth spending some time defining the shape of the data that stateful-check is returning. If we're going to have multiple consumers (i.e. stateful-check itself, and the CIDER nREPL middleware), then it would be nice to know what we're committing to.

This is especially valuable if the versions for these things are not required to be the same, so we know what it means to maintain compatibility with other versions.

Responding to some other things you mentioned:

What I have now looks like this: ...

Neat!

When I run a test with Cider, first the normal Cider Test report buffer shows up, which just displays the original test report from stateful-check as pure text. If you then move to a failing test case (with TAB) and press "c" the Stateful Check Emacs Mode kicks in. ...

This is a different interaction mode than I had imagined. I thought about it as running the test through a special stateful-check CIDER command, but if you can get it working through the test report that's pretty cool.

Any differences in the output when rendered in the CIDER test report, compared to the stateful-check Emacs mode, could be a bit jarring to users, so we should try to keep them in sync.

  • I noticed that the actual/expected values differ between test runs. Sometimes they are forms containing the expressions that failed, sometimes they are just the plain values. Not sure where this difference is coming from and I haven't investigated this yet.

I'm not really sure what you mean by this. Can you provide a concrete example of what you mean?

  • Another thing I noticed, but did not measure yet, is that I have the feeling stateful-check got a bit slower between the version I was using and the new released one. It's just a feeling, but something to keep in mind.

I'm interested in this. If you're able to demonstrate a reproducible slowdown then I'd be happy to investigate what's causing it.

My initial plan was to contribute it to Cider so it is available to users without configuring middleware.

That would be nice, but it's also not too hard for an Emacs extension to register middleware in CIDER. I think adding to the Emacs variable cider-jack-in-nrepl-middlewares should be enough to do it. You can even provide a :predicate which ensures the middleware is only added when stateful-check is on the classpath.

If you are interested in collaborating on the middleware and/or Emacs mode (I would love to, you seem to know some Emacs/NREPL tricks), and you would like to move this to another repository I am open to any suggestions.

I don't have a lot of time to spend on this at the moment, but I'm definitely happy to advise on things.

My ideal situation would be to have all the Emacs and CIDER integration code live in the stateful-check repository. I think this makes managing the contracts between the different pieces a bit easier.

While you're still working things out feel free to put the code wherever is easiest for you.

As a next step, I'm trying to make the command handles (or the state of the model after each command) inspectable. I'm just not sure yet where to grab that state from. From the runners? Or similar to what the failure-message does, step though each command and capture the state, with setup and teardown methods run. Do you have any hints?

At the moment, the states are never printed, so we don't bother exposing them. Printing the trace only uses the handles, command information, and result values.

If we want a richer representation, then I would change the failure-message method into something like full-trace which returns a map with handle, before-state, after-state, command, arguments, result, and (optionally) failure-details. (I might be wrong about all the information needed here.)

This would also require changing failure-messages to detect failures in the new structure, and changing the trace printing code to use the new structures.

  • The specification itself (I add it to the test report at the moment, but I think it can be easily added to the exception)

Sure. I don't think there's a problem doing this. How do you imagine this being useful to display in the context of a specific test failure?

  • The command frequencies (this seems right now only be available in the test report and not at the point where the failure exception is thrown)

I don't see this as being particularly useful. The command frequencies table is primarily intended as a way to see whether your passing test is exercising the commands you expect it to be. This is to avoid the case where you think a test is passing, but you're actually not testing a command because it never passes its requires or precondition checks.

I don't think there's much value attaching this a failure exception, because failures change the distribution of commands (particularly because of how shrinking goes). I'm happy for you to push back on that, though.

  • The state of each command execution and the result as a value

I covered this in my comments about failure-message, above.

r0man commented 1 year ago

Hi @czan,

thanks for your comments. Again, very helpful.

(deftest ^:interactive failed-assertion-is-printed
  (is (specification-correct?
       {:commands {:cmd {:command       (constantly true)
                         :postcondition (fn [_ _ _ _]
                                          (is (= 1 0)))}}}
       {:run {:seed 0}})))

The first time I run it the result is reported like this:

Fail in failed-assertion-is-printed
Sequential prefix:
  #<1> = (:cmd)  = true
    expected: (= 1 0)
      actual: (not (= 1 0))

Seed: 0

And the second time I run it the result it is reported like that (and from then on always):

Sequential prefix:
  #<1> = (:cmd)  = true
    expected: 1
      actual: 0

Seed: 0

I have the suspicion that this might actually be an issue with Cider. I tried to reproduce it with only "lein test" but couldn't. I take a look at that another time.

czan commented 1 year ago
  • About the difference of :actual/:expected. I observed when I run the following test for example: ...

I have the suspicion that this might actually be an issue with Cider. I tried to reproduce it with only "lein test" but couldn't. I take a look at that another time.

I tried running this test, too. The first time, the emitted events were

[{:file test.clj, :line 10, :type :fail, :expected (= 1 0), 
:actual (not (= 1 0)), :message nil}]

whereas on subsequent runs it looks like

[{:file test.clj, :line 10, :expected 1, :actual 0, :message 
nil, :type :fail, :diffs ([0 [1 0 nil]])}]

Looking into CIDER, it looks like this is added in cider.nrepl.middleware.test.extensions: https://github.com/clojure-emacs/cider-nrepl/blob/a53ab41c7ae56fda9f4a7b5916c5bd1f8fa60d95/src/cider/nrepl/middleware/test/extensions.clj#L37-L38

The CIDER code to print test results looks like it's in cider-test.el: https://github.com/clojure-emacs/cider/blob/ec97c88c46759afd929a415604311a3b2aecb338/cider-test.el#L391-L444

This seems like a good argument to keep the test events as data, and pass them through in their entirely. Then we can get CIDER to render them using the same code that renders assertions elsewhere.

  • Ok, I will move the NREPL middleware and the Emacs Mode to the stateful-check repository soon. Do you have a specific directory layout in mind? Should it be 3 sub projects? Like what you did in wingman? The core stateful-check project, the debugger, and a NREPL code (which just calls the debugger)?

I think it makes sense to keep things as a single project. I'd expect a stateful-check.nrepl namespace which contains the nREPL middleware, and a stateful-check.el file in the root of the repository. I'm not entirely clear on where the debugger might live - maybe also in stateful-check.nrepl namespace, if it's nREPL specific? Or stateful-check.cider if it's CIDER specific?

r0man commented 1 year ago

Good idea, using the Cider print machinery. I will look into it.

r0man commented 1 year ago

Alternate image text

Hi @czan,

it's been a bit busy these days. Here a quick update on where I am:

That's mostly it. The screenshot above shows on the left a "enhanced" CIDER test report with the debugger integrated (instead of just showing plain text). On the right side the Stateful Check debugger in it's dedicated buffer. In both screens the arguments and results are inspect-able. Both windows are actually evaluating a failing case, which is indicated by bold/comment faces on the command handles and execution results (probably not obvious on the image). The inspector shows a command argument "object" that has some meta data and the "real" and "symbolic" values.

It's still a bit rough around the edges, and some things probably missing, buggy or even questionable, but I'm actually close to how I actually imagined this :)

czan commented 11 months ago

Hi r0man,

[... lots of cool stuff ...]

That sounds awesome! It sounds like you're building a bespoke stateful-check interaction mode for Emacs/CIDER, rather than just enhancing the inbuilt CIDER test runner. I imagine that's a lot of work, and it might make it harder to discover, but it sounds like you're getting a pretty natural interaction and debugging experience out of it. That's very exciting!

It's still a bit rough around the edges, and some things probably missing, buggy or even questionable, but I'm actually close to how I actually imagined this :)

I haven't had a chance to look at any of your code. If you think it's at a stage where it's worth me doing so I'd be happy to make some time in the next week or so. Just point me to the relevant commits/branches. 🙂

r0man commented 11 months ago

Hi @czan, ok, sounds good. I will ping you when I'm ready. Might take a while because I'm going on holiday soon. Just wanted to post a little update.