BoiseState-AdaptLab / adapt-lidar-tools

Contains code/project notes/ and Data for GEO+CS lidar data processing
GNU General Public License v3.0
8 stars 2 forks source link

Research principles of testability #306

Closed SpencerFleming closed 4 years ago

SpencerFleming commented 4 years ago

The goal is to do research into techniques for improving the testability of code. Right now there are some segments of our code that can be difficult or overly complicated to test. I want to minimize this in the future so testing is less expensive, and code is more maintainable.

SpencerFleming commented 4 years ago

Looking up unit testability, I found this article: https://rollout.io/blog/unit-testing-external-dependencies/

The first thing mentioned was Command-Query separation.

Command-Query separation

It states that every method should either be a command that performs an action, or a query that returns data to the caller, but not both. In other words, Asking a question should not change the answer.[1] More formally, methods should return a value only if they are referentially transparent and hence possess no side effects. -- https://en.wikipedia.org/wiki/Command%E2%80%93query_separation

Referential transparency

An expression is called referentially transparent if it can be replaced with its corresponding value without changing the program's behavior.[1] This requires that the expression be pure, that is to say the expression value must be the same for the same inputs and its evaluation must have no side effects. An expression that is not referentially transparent is called referentially opaque. -- https://en.wikipedia.org/wiki/Referential_transparency

Side effects

In computer science, an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation. State data updated "outside" of the operation may be maintained "inside" a stateful object or a wider stateful system within which the operation is performed. Example side effects include modifying a non-local variable, modifying a static local variable, modifying a mutable argument passed by reference, performing I/O or calling other side-effect functions.[1] In the presence of side effects, a program's behaviour may depend on history; that is, the order of evaluation matters. Understanding and debugging a function with side effects requires knowledge about the context and its possible histories -- https://en.wikipedia.org/wiki/Side_effect_(computer_science)

It's clear how this affects testability; queries will be easier to test, so we should keep them separate, thus reducing the amount of testing required for the more complex commands.

SpencerFleming commented 4 years ago

The next thing mentioned in the above article is separation of concerns, which pairs really well with the above.

Seperation of concerns

When concerns are well-separated, there are higher degrees of freedom for module reuse as well as independent development and upgrade. Hiding the implementation details of modules behind an interface enables improving or modifying a single concern's section of code without having to know the details of other sections, and without having to make corresponding changes to those sections. Modules can also expose different versions of an interface, which increases the freedom to upgrade a complex system in piecemeal fashion without interim loss of functionality

Separation of concerns is a form of abstraction. As with most abstractions, interfaces must be added and there is generally more net code to be executed. So despite the many benefits of well separated concerns, there is often an associated execution penalty. -- https://en.wikipedia.org/wiki/Separation_of_concerns

This is a common principle taught in early CS courses. As the article notes, however, these is an execution time penalty associated with abstraction. Fitting, for instance, is better done in a larger function for the sake of preventing that run-time bottleneck from worsening. CmdLine, by contrast, could benefit from separation of concerns just fine without causing a significant run-time impact.

SpencerFleming commented 4 years ago

The last thing mentioned in the article above is dependency injection-- Something we have actually already been employing to some extent.

From Wikipedia:

Dependency Injection solves problems such as:

  • How can an application or class be independent of how its objects are created?
  • How can the way objects are created be specified in separate configuration files?
  • How can an application support different configurations?

Creating objects directly within the class that requires the objects is inflexible because it commits the class to particular objects and makes it impossible to change the instantiation later independently from (without having to change) the class. It stops the class from being reusable if other objects are required, and it makes the class hard to test because real objects can't be replaced with mock objects.

Dependency Injection UML diagram

There are at least three ways a client object can receive a reference to an external module:

  • constructor injection: the dependencies are provided through a client's class constructor.
  • setter injection: the client exposes a setter method that the injector uses to inject the dependency.
  • interface injection: the dependency's interface provides an injector method that will inject the dependency into any client passed to it. Clients must implement an interface that exposes a setter method that accepts the dependency

Really the only novel thing here is how dependency injection encourages use of interfaces. Interfaces make it extremely easy to inject fake dependencies during testing, and make a lot of previously un-unit-testable code testable.

The down side, once again, is that there is a high potential for run-time slowdowns.

EDIT: There is a bit about 'other types' of dependency injection in Wikipedia, where it is mentioned that some test suites have ways of making and injecting mock dependencies into code that isn't made with it in mind. I looked into GTest's documentation and sure enough-- There is a library inside it, and inherently in our deps folder, called 'GoogleMock'.

This page describes essentially what it does and how. For a moment I was worried while I read this, because the recommended way to use it is still to have an base class (acting as a c++ interface) involved, so the mock dep and real dep can be children of the interface base class. However, I discovered that you can technically mock a non-polymorphic class with no virtual functions as well. This could change how we test things, and let us unit test properly without having to abstract things in a way that might introduce unwanted overhead.

TL;DR: Many of our classes explicitly depend on specific other classes in order to function, and this is hard to test. This is typically fixed in C++ by having the dependencies inherit from a base class full of virtual functions. The class we want to test would expect the base class, so it can accept any children of said class, and then during the test, we give it a 'mock class' that inherits from the base class, but 'implements' the virtual functions with some predictable, hard coded value. We can't afford to do that in a few places, and in those places, we can just trick the compiler to replace our real class with another one, thanks to GoogleMock. I think. That's why I have to experiment with it.

SpencerFleming commented 4 years ago

In this article, a software developer with over 30 years of experience addresses misconceptions about unit tests. Some points I think are worth mentioning:

Another article says "Aim at covering all paths through the unit. Pay particular attention to loop conditions." This is a very intuitive phrasing, I think.

SpencerFleming commented 4 years ago

Take-aways

These are the things I think would be worth considering for the code reorganization.

SpencerFleming commented 4 years ago

I think this should be enough for now, in terms of testability. I spent a lot more time scouring and reading up on this than I thought I would-- It was hard to stop myself at some point. That said, a significant portion of the research actually helped me with other areas I wanted to research, so hopefully that will boost my productivity with those tasks.