Discussion: Smoke testing

sibwaf commented 3 years ago

Not a long time ago I was developing a static analyzer which used Spoon in a company. We had an utility called "self-tester", which ran our analyzer on a bunch of open-source projects and calculated the diff for the analyzer reports. Last time we decided to update Spoon, we ran into a bunch of problems while running this utility and couldn't update. For some of these problems I created issues (#3733, #3734, #3740), and some were probably left undiscovered.

I no longer work for that company, but the "self-tester" concept feels like a very important addition to Spoon's test suite to me.

So, the idea is to add a separate testing step which builds models for a bunch of open-source projects (on "known-good" commits) and runs a bunch of code on these models checking for crashes - traversing the tree in any way, calculating method overloads (as one of the "heavy operations"), anything goes. This will surely slow down the tests by a lot, but will allow us to track regressions and unexpected bugs not yet discovered by usual unit-tests. As a bonus it will ensure that Spoon is able to handle the open-source madness. A simple example of this testing step can be found in my dataflow analyzer prototype (it's located in the "Launcher" module).

I could try making something by myself but I'm not sure when I'll have free time, so I'd be happy if anyone likes the idea and decides to implement it by themselves.

Any thoughts?

slarse commented 3 years ago

Hi @dya-tel

We actually have a lot of smoke tests up and running! You can find the dashboard right here. They run once a day, so they don't slow down our integration process, but still give us a heads up pretty quickly if something goes wrong. For most of these we just run this script, which more or less just builds the project, runs the tests, then builds a Spoon model, prints the model and runs the tests again. So while not quite as intricate as what you suggest, it's pretty decent, and tips us of about major problems that do crop up from time to time.

For some projects that use Spoon, we simply run the project's test suite with the latest snapshot version of Spoon. This gives us many of these "heavy operations" you refer to. If Kawa's test suite is robust and uses Spoon in interesting ways, it's definitely a candidate for being added as this kind of smoke testing.

sibwaf commented 3 years ago

Huh, had no idea.

Running tests from other projects which use Spoon is cool and all (as I understood it only the usual unit-tests are being run; I might be wrong), but that's probably not enough - I'm talking about making a separate module or something like that for "trying to destroy everything". I mean not relying on others' mostly synthetic unit tests, but "bombarding" the real models by ourselves. Just like with the current model build testing, but taking it a bit further. And, maybe, adding more projects to test on while we're at it?

Sadly, Kawa is not even near using anything "interesting" - only simple tree walking for now. Maybe overload and inheritance checking someday, and that would probably be it.

slarse commented 3 years ago

but "bombarding" the real models by ourselves

Do you mean that we'd have some generic operations applicable to any project, or tailored operations for each considered project? If it's the former, do you have any ideas on how to make that happen? Would there be an oracle or would it just be "it doesn't crash" tests? If it's the latter, I'm afraid it may be too much work to maintain. Even maintaining the current smoke tests, which are mostly run by that single script I linked earlier, is quite a lot of work from time to time.

That being said, the current smoke tests are fairly decent. We build the models of the entire projects and then pretty-print them back and ensure that the test suites still pass. This is a relatively strong indication that the model is healthy. For the specialized tests where we inject the latest version of Spoon, we get a relatively strong indication that Spoon is working as these projects expect.

If we can find a generic one-script-fits-all solution to your proposed bombardment, it might be worthwhile. Otherwise, I just don't think we have the manpower to maintain it.

I think the cheapest way for us to smoke test is simply to keep adding projects that use Spoon in intricate ways, and run their test suites with the latest version of Spoon. For example, I'm adding the Spork structured merge tool today, which has hundreds of tests that are extremely sensitive to changes in the metamodel.

Sadly, Kawa is not even near using anything "interesting" - only simple tree walking for now. Maybe overload and inheritance checking someday, and that would probably be it.

I see, we have plenty of tree walking in the smoke tests already. Does it run in classpath or noclasspath mode? Does it make any use of types (resolving types is always kind of finicky in noclasspath mode)?

sibwaf commented 3 years ago

Do you mean that we'd have some generic operations applicable to any project would it just be "it doesn't crash" tests

This variant. Just a bunch of simple generic operations - something like "get all the methods in the model and run overload check for each pair, get declarations for everything, etc; if it doesn't crash, these operations are probably safe enough". It shouldn't need any maintenance (aside from updating the test projects from time to time and maybe adding more checks), but would probably be pretty slow.

My intention with this idea for the most part is not checking whether the model is being built correctly, but whether working with it doesn't destroy it in the long run (the same way getExecutableDeclaration does which results in a crash later, see the first linked issue in the initial post).

Does it run in classpath or noclasspath mode? Does it make any use of types (resolving types is always kind of finicky in noclasspath mode)?

Well, I made the "launcher" (which builds the model) mostly as a proof-of-concept, so I didn't really pay too much attention - it runs in the default mode (classpath, probably?) without me attaching anything apart from sources. So, there is probably a bunch of shadow types. The only thing I do with types for now is checking whether the type is final so I could use the interprocedural analysis. Everything else just walks only the "real" model elements which have declarations.

slarse commented 3 years ago

My intention with this idea for the most part is not checking whether the model is being built correctly, but whether working with it doesn't destroy it in the long run (the same way getExecutableDeclaration does which results in a crash later, see the first linked issue in the initial post).

If we could find a way to define generic tests to run these "crunch" operations on arbitrary (real) projects, then I think it's a great idea. I'm sure it's possible to do somewhat generically, e.g. "find a method with at least one use and rename it", or something like that. Or even, "rename every method". We have the computing power for running such tests, we just don't have the manpower to maintain yet another test suite.

it runs in the default mode (classpath, probably?)

By default, Spoon runs in noclasspath mode. In classpath mode, it requires all types to be either in the source or on the classpath (the program essentially needs to compile), which is a bit inconvenient for people who are just trying Spoon out. But let's hold off on that then, perhaps it won't gain us much over the other projects we already test at this point. Let me know if you feel like that changes.

INRIA / spoon

Discussion: Smoke testing #3810