finale-lua / lua-scripts

A central repository for all Lua scripts for Finale.
https://finalelua.com
Creative Commons Zero v1.0 Universal
15 stars 14 forks source link

Snapshot testing #255

Open Nick-Mazuk opened 2 years ago

Nick-Mazuk commented 2 years ago

The issue

Currently, we have a fragile repository.

Our function library is great for creating shared logic and dramatically improves code reuse, developer ergonomics, and the speed at which we can write new scripts. It allows for more robust code as we can encapsulate common operations in a single, robust function. And it allows us to fix future bugs by updating a single function rather than dozens of scripts.

However, this also produces a fragile ecosystem. The more we rely on these functions, the harder it is to understand the impact of a change to any single function. For instance, as I was redoing our bundler yesterday, I noticed importing a single library function can bring in several indirect dependencies. For example, library.mixin_helper depends on library.mixin, which itself depends on library.general_library. So there are several layers of indirection. So if someone updates the general library, there's no way to ensure every script still works without manually tracking every dependency and checking that every script functions as expected in every single edge case.

While as far as I know, this hasn't been an issue, an adjacent one occurred with the recent bundling PR #240 which had to be rolled back in #245. In this instance, the 3rd-party bundler only bundled the first library script in each file, and also did not bundle indirect dependencies. When I tested the resulting bundles, things happened to work for me since I just happened to pick scripts with a single dependency and no indirect dependencies.

Proposed solution

The industry standard way of fixing this issue is with automated testing. For instance, you can see these automated tests in our new bundling code (merged in PR #253):

https://github.com/finale-lua/lua-scripts/tree/master/.github/actions/bundle/src

These tests are unit tests. They ensure that even as I fix bugs in the bundler, the bundler will continue to work as expected. It also ensures that once a bug is fixed (and a test added), that bug can never resurface.

However, unit tests likely do not work in our use case since the vast majority of our code has side effects inside a Finale file. Additionally, while the bundler has complete unit test coverage, these still cannot guarantee the bundled scripts will run. They only guarantee the bundler works as expected, but not that the expected result produces valid RGP Lua code.

But there's also snapshot testing. With the new "Save document as text file…" script, we have the potential to introduce snapshot testing into the repository. The general workflow would be as follows:

  1. A developer writes the script as normal
  2. A developer creates a file (or multiple files) of the desired before state of the script. This set of files should also include all conceivable edge cases.
  3. The developer may also need to write some code to tell the testing tool how to run the script (e.g., select these measures on these staves, press these dialog buttons)
  4. Our snapshot testing tool would then run these files through the script, and save the resulting output as a txt file
  5. Finally, the snapshot testing tool compares the resulting output with the output from the previous run (usually called a "golden"). If they match, the test passes! If they don't match in a desired way, the developer can update the golden to make sure future tests pass. If they don't match in an undesired way, the test fails.

On every PR, the tests for the entire repository would run.

Benefits

  1. We can ensure all changes to our library (and bundler) only affect scripts in a desired manor (e.g., no unexpected bugs are introduced), leading to increased stability
  2. Future versions of RGP Lua can then be tested against our scripts to ensure there aren't any regressions in its behavior
  3. We can ensure scripts work across multiple different versions of Finale and operating systems, including future versions of Finale, macOS, and Windows, by running the tests on each Finale/OS pair.

Downsides

  1. Setting this up could take quite a bit of engineering time since as far as I can tell, the entire toolchain would effectively need to be custom.
  2. We'd need to find a way to run Finale on GitHub actions. This may not only have technical hurdles, but also potential licensing issues.
  3. These kinds of tests could potentially take quite a bit of time to run (estimating 20–30 minutes if we have hundreds of tests?). This might not be a huge deal as review times are usually measured in hours or days for this repo. However, it could impact someone making a PR since think they're done, only to learn half an hour later they need to make changes. If tests can be run locally, that would help (since our computers are likely much more powerful than the ones GitHub actions uses, leading to faster tests).

Conclusion

Ultimately, I don't see a huge reason to implement this now as the benefits probably don't outweigh the costs just yet. However, as the number of scripts grow, the size of the library grows, the number of first-time coders join, and the more Finale professionals use these scripts, there will be a growing need to ensure stability and reliability.

So right now I'm mainly throwing this idea out there to see what everyone things.

cc @rpatters1 @cv-on-hub @ThistleSifter

rpatters1 commented 2 years ago

The more automated testing, the better, as far as I'm concerned. I agree that we may not have reached the point where it's a priority. One caveat about the Save As Text File is that it doesn't see differences in layout, only differences in notation that would directly affect how the piece is performed. This is by design. (More info in the notes.)

I've been expanding and using the jwluatesttools repo that Jari initially created to regression test RGP Lua. A tool we could use for testing is Lua itself, in the form of test scripts against test docs. That way we wouldn't necessarily be limited to what the Save As Text File script does.

Nick-Mazuk commented 2 years ago

I assume you're referring to https://github.com/jariw2/jwluatesttoolkit?

That could be quite useful. I've not used it and am not too familiar with it. Does it require Finale to run tests? I assume so, but want to double-check.

One caveat about the Save As Text File is that it doesn't see differences in layout, only differences in notation that would directly affect how the piece is performed. This is by design. (More info in the notes.)

Ah, good to know. Should have read the description more carefully.

rpatters1 commented 2 years ago

The current version is a fork of that one: https://github.com/finale-lua/jwluatesttoolkit

It requires Finale to run and so far I run it manually. But I always do this before every release (at a minimum). I'm not necessarily suggesting we use that (although not opposed). It is currently mainly for regression testing the PDK Framework and RGP Lua. But we could either copy its approach or expand it to add library tests for our library.

Nick-Mazuk commented 2 years ago

Ok, I see. I think more thought is needed, at least on my end.