I'm making this as an umbrella issue. It may be hard to call this 'solved', so we should close it if it the presence of this issue ever stops feeling helpful

The problem

The platform is currently quite hard to test. It has been hard for me to exactly express why but I think I can do it now.

Clean Architecture

I'd like to introduce this notion of "clean architecture". Here are some materials on it. In the diagram below, the key idea is that "the dependency arrows only ever point inwards". For example, a consequence of this approach is that your business logic doesn't depend on a specific database implementation. Therefore, the business logic can be tested without needing a database.

We already have some of these ideas present in the code base, we're just not using them as well as we could. See "A motivating example" below.

To whet your appetite, here is an excerpt from the first link:

Though these architectures all vary somewhat in their details, they are very similar. They all have the same objective, which is the separation of concerns. They all achieve this separation by dividing the software into layers.

[snip] ... In fact your business rules simply don’t know anything at all about the outside world.

The overriding rule that makes this architecture work is The Dependency Rule. This rule says that source code dependencies can only point inwards. Nothing in an inner circle can know anything at all about something in an outer circle. In particular, the name of something declared in an outer circle must not be mentioned by the code in the an inner circle.

How does this translate to our code? I'll let you think about that for a moment ...

A motivating example

Imagine for a moment that a user submits a build job.

Reco zips up their tarball
Uploads it by doing a HTTP request PUT /builds/:id/input.

We now enter this code path:

https://github.com/ReconfigureIO/platform/blob/ffb20d436f641693fc973bad0f977339fac60e96/handlers/api/build.go#L181-L216

Now let's say, we're working on pull request #270, which makes the callback URL more configurable. We want to write a test which checks:

Given:
- this database state
- this configuration
- this HTTP request
Then:
- The callback URL has <this expected value> in b.AWS.RunBuild(build, callbackURL, reportsURL).

Right now, it is very difficult to arrange this. We have to actually make a database and populate it with state, for example. To do this, our test has to have access to a database. It has to import GORM. Now our tests are slow, because they use a real database, their reliability is impacted (#223), and the tests can't run in parallel.

So what is the solution to this?

Well, we're already part of the way there! During the upload, we don't have to have S3 present, for example, because S3 is accessed through a service which is dependency injected onto the Build. So we can use a fake.

However, we can't do the same with the build model. Why? Because we directly reach into the database from our handler code, by using gorm functions:

https://github.com/ReconfigureIO/platform/blob/ffb20d436f641693fc973bad0f977339fac60e96/handlers/api/build.go#L53-L65

Our business logic (the HTTP request handler) has become entangled with the database, and now it is hard to test. We can't mock gorm - in part because the way it works is by chaining function calls which return *gorm.DB, and that is not something that we can mock.

What next?

Well, it turns out, someone already had the right idea:

https://github.com/ReconfigureIO/platform/blob/ffb20d436f641693fc973bad0f977339fac60e96/models/build.go#L12-L18

The BuildRepo interface is an abstract interface which can be mocked. If our Input(c *gin.Context) handler only retrieved builds through the BuildRepo interface, then we would be able to mock all of its dependencies, and it would be quite easy to write a test that we had wired up our callback URL correctly.

Essentially, any time that the handlers package wants to do a database query, that logic should live behind a method on the "data source provider" (in this case the BuildRepo).

So our 'entities' (inner most ring in the above diagram) would be our models, such as builds, and our application logic (inner rings) would be the HTTP handler. Our application logic would essentially have no idea that it was talking to a database (outer ring), since it just does it through these Repos which are essentially "database controllers" which live in the outer rings.

When are we 'done'?

The end goal is reached when our handlers package have no dependencies on concrete implementations (e.g, S3, AWS Batch or a database). It would be a good sign if we didn't import gorm, for instance.

At the moment, the models package contains both the concrete implementations of the data providers, and the definitions of the interfaces. Ideally, it would just contain the interfaces, and then only the main package would pull in any dependencies on a real database (and gorm) through another package. The tests wouldn't even have an indirect dependency on the database, for example.

This might seem like a huge insurmountable task. The way we overcome that is that we should do it in small pieces. I believe it should be possible to eliminate Build's dependency on GORM and add a few tests with something like a couple of hours effort. From there, it should be possible to do it to other packages (one at a time), too.

Please comment with an emoji of your choosing if you read this far! (so that I get the notification and know we're talking about the same thing :D)

ReconfigureIO / platform

Move towards "clean architecture" #290

The problem

Clean Architecture

A motivating example

What next?

When are we 'done'?

Progress