Branch coverage measure exception

david-a-wheeler commented 1 year ago

In some systems, many branches exist that cannot be executed by any test you could write. E.g., if you're writing defensive code of the form:

if (cannot_happen) then raise_exception

... then if the system uses such code constructs often, it's impossible to meet the 80% branch coverage criterion as written (a gold requirement). Such constructs are in use. In particular, some programs use this as a defensive design approach to detect when program state or control flow has gone wrong.

After some discussion, I've come to the conclusion that it doesn't make sense to require tests that can't be written. You could argue that a different implementation approach might be better (e.g., the use of assertions), but as a measurement of test quality, we can't argue that there should be tests for something when it's not possible to write such a test.

I suggest modifying the details text to make it clear that for branch coverage, it's okay to ignore branches where it can determined that no test could be created that could execute the branch.

One problem with this is that the obvious way to determine each case is entirely manual. But this modification would at least make it possible to achieve the gold badge in cases where it should be possible.

This would involve a change to config/locales/en.yml. Current text for criterion test_branch_coverage80:

      test_branch_coverage80:
        description: >-
          The project MUST have FLOSS automated test suite(s)
          that provide at least 80% branch coverage if there is
          at least one FLOSS tool that can measure this criterion
          in the selected language.

I propose adding something like this:

        details: >-
          The branch coverage measurement MAY ignore any branch
          where it's not possible to write a test to execute it.
          Such cases SHOULD be clearly marked
          at each location they occur, e.g., with a comment or program convention.

Also, in dynamic_analysis details, the text:

          automated test suite with at least 80% branch coverage.

Should be followed with:

          The branch coverage measurement MAY ignore any branch
          where it's not possible to write a test to execute it.

This has only started to get raised recently. This problem only affects some programs, and I suspect it's a small minority of programs. Many programs simply don't test for things that can't happen, which is simpler :-). An alternative approach is to use assertions, e.g., assert(must_always_be_true); assertion constructs can easily be automatically determined as not a branch (in the usual sense) and can be easily ignored by coverage tools.

I think this problem doesn't happen often in part because the use of branch coverage as a way to measure test quality is old & well-understood; it at least goes back to 1975 ("An Approach to Program Testing", J. C. Huang, ACM Computing Surveys (CSUR), Volume 7 Issue 3, Sept. 1975, Pages 113-128). Support for branch coverage measurement is also old; the gcc compiler includes "gcov" which can measure branch & statement coverage, and while I don't know how old gcov is; this page: http://nixdoc.net/man-pages/NetBSD/man1/gcov.1.html says gcov appeared in NetBSD 1.4, which was released on May 12, 1999 https://www.netbsd.org/releases/formal-1.4/ and I suspect gcov existed long before that. Whether or not "impossible" branches in gcov are included in coverage depends on configuration, as discussed in https://github.com/gcovr/gcovr/issues/324 . gov's option --exclude-unreachable-branches excludes cases it can automatically determine, code can be marked as excluded using GCOVR_EXCL_LINE, and the command line option --exclude-lines-by-pattern '.assert.' (or similar) easily handles assertion statements.

kwwall commented 1 year ago

Since no one has yet seemed to comment on this, I'll step out and share my ignorance first. Or rather my forgetfulness. (It's been a really long time since I took any formal software testing course--probably since my masters in CIS at CWRU--so my apologies for likely having forgotten almost all of what I learned.)

Anyway, don't we need to be more precise in terms of what we mean by "branch coverage" here? For example, if you encounter something like this in C/C++/Java/C#/etc... any language that has short-circuit logic operations such as '&&' and '||' may complicate things:

if ( conditionA && conditionB ) {
   doBranch1();
} else {
   doBranch2();
}

seems to make "branch coverage" a bit ambiguous. Is it sufficient to simply execute both doBranch1() and doBranch2() or must you alsofully test the test expression 'conditionA && conditionB' for all variations of truth tables, e.g., cases for where both 'conditionA' and 'conditionB' are separately false? If these are simple side-effect free boolean expressions, it may not matter, but some languages like C and C++ (and probably others) allow side-effects within the sub-expressions themselves.

I vaguely seem to recall reading either an ACM or IEEE paper (which I sadly cannot find) by Elaine J. Weyuker where she showed that there were actually a few different variations of branch coverage, one of which I think she referred to as "all paths coverage". I bring this up to only say that different tools may be measuring different aspects of "branch coverage" so it would be good if we could define precisely what we mean by "branch coverage" before using it as completion criteria.

david-a-wheeler commented 1 year ago

Anyway, don't we need to be more precise in terms of what we mean by "branch coverage" here? For example, if you encounter something like this in C/C++/Java/C#/etc... any language that has short-circuit logic operations such as '&&' and '||' may complicate things:
if ( conditionA && conditionB ) {

No, that's exactly the difference between a branch and a condition. An if-then-else construct has exactly 2 branches, the "true" branch and the "false" branch. Branch coverage is also called "decision coverage"; see NIST's "decision or branch coverage".

If you wanted to cover changing the various conditions within that "if" statement, as shown above, that would be a condition coverage metric. In general that's even more work to meet, and there are far fewer tools that support measuring it. You can have 100% condition coverage without 100% branch coverage (aka decision coverage), interestingly enough.

There's a variation called "Modified condition decision coverage (MCDC)"; this is a strong coverage criterion that is required by the US Federal Aviation Administration for Level A (catastrophic failure consequence). That's hard to meet, and projects that need to meet it already know that :-).

For more info, see NISTIR 7878.

I'm all for clarifying. A lot of the formal documents I found assume this as base knowledge and don't provide more precise definitions. Suggestions wanted :-).

coreinfrastructure / best-practices-badge

Branch coverage measure exception #1864