Closed mbland closed 9 years ago
I like the breakdown here: the three different levels make sense, but I'd suggest that the terminology could be better. Small/Medium/Large could be conflated with the size or complexity of the tests themselves. What about Low(-Level), Mid(-Level), High(-Level) (or simply Unit/Integration/System)? Because what is meant by "Large" here is really not that the test is large, or even that the tested components are necessarily large, but that it tests at a bird's eye view.
The different-sized bricks in the pyramid are an apt visual, but the down-arrow on the left seems to imply falling confidence, rather than rising, as intended. I'm at a loss for how else to represent it, though. Would it be worth enlisting a visual designer to help (cc @jehlers @ericadeahl @ericronne)?
Should database access be mentioned in the discussion of touched-upon systems in the test levels, or has that intentionally been left out? It can be a point of contention amongst testers. Perhaps we should at least answer the question of whether a test accessing the database automatically places it at the level of an integration test.
I love these sentences in the closing paragraph.
The goal is to automate at the lowest level possible, but no lower. Shoot for an appropriate balance, not an ideal one.
Overall, looks good! I'll submit some minor wording edits in a PR.
Should database access be mentioned in the discussion of touched-upon systems in the test levels, or has that intentionally been left out?
I wouldn't sweat it. I feel like databases are just like any other dependency. The tester can choose to mock/fake or not depending on what makes sense in context.
Should we mention somewhere in here that it also helps if your tooling allows one to easily run the various test tiers separately? All the balanced test writing in the world doesn't help if it's hard to run the subset of quick tests alone quickly.
Thanks for the feedback, @arowla and @cpapazian! A few points:
I developed a highly-sensitive allergy to naming debates, and the Small/Medium/Large battle of 2006 was one of them. (The "Test Certified" debate of the same year was even worse!) If you can come up with an unequivocally superior naming scheme that nearly everyone can agree is better without debate, I'd be happy to adopt it. :-)
The S/M/L names came about because, at the time inside Google, a "unit" test was any test that ran in 5-10min and didn't rely too much on NFS or production. A "regression" test was everything else. We picked the new terms to cut ties with the misunderstood terms from the past, because they made some intuitive sense, and because they gave us the chance to rigorously define exactly what made a test "small", "medium", or "large".
Actually, a "Large" test is large, as are the components under test, because it tests at a bird's eye view. Sure, depending on the project it may still run in a second or less, as it's more a conceptual size than one derived from quantifiable factors; but usually, "Large" tests are pretty slow and expensive.
Yeah, it's unfortunate that there's a "down" arrow to indicate "increased" confidence in an individual change, but I'm also open to alternate representational ideas. That said, this arrow's been pointing the same way for about ten years. ;-)
Funny; I was always thinking of databases of one form or another when talking about "separate processes" or "services in a datacenter", but didn't spell them out. I can if folks think it would really make explanation more accessible.
About the tooling: Yes! And we did use the test sizes in our tooling and reporting systems. I'll add a bit about that.
Sorry to stir up your naming debate allergy, @mbland! Unless anyone else speaks up with an opinion, I'm happy to leave as-is.
On the database issue, I'm used to it being such a hot topic in testing, that I felt maybe it deserved honorable mention, if only to stress being pragmatic about it. Or perhaps to stress not stressing about it, @cpapazian!
@mbland, was thinking about this a bit more on the bus ride in ... specifically, the large/system-level tests. What role do they play in projects at 18F? It seems most of the things we are working on are covered pretty well by small/unit and medium/integration tests ... or am I missing something?
@cpapazian At 18F, there doesn't seem to be a huge market/need for the larger-sized tests--yet. I certainly wouldn't sweat adding them if current test suites appear to cover most of the bases. However, it's still useful to include them in the discussion, so that avoiding them is a conscious choice rather than an accident of ignorance.
Also, given that the hope is that this playbook will bear relevance farther and wider than 18F itself, there are certainly other projects that may benefit from adding large tests, or from realizing that their existing tests fall into the "large" category specifically. There may be folks who are blissfully aware that other test sizes are both possible and desirable.
I'm going to close this one and open a new issue for the next section. If there's any desire for further discussion, feel free to continue discussion on this issue.
Requesting editorial feedback on the Small/Medium/Large Test Size Pyramid section.
This is the first of a progression of issues I'll open on each section of the playbook. Eventually I'd like to ensure that the entire thing has had the benefit of review and input from as many interested members of the team as possible. I also plan to use this as the first tangible artifact co-owned and maintained by 18F Practices and the soon-to-be-formed 18F Testing Grouplet (think
s/working group/grouplet/
).cc: a few folks who might be specifically interested based on recent conversations and Slack chatter: @afeld @RobertLRead @adelevie @arowla @theresaanna @shawnbot @cpapazian @gboone @monfresh @micahsaul @msecret