Cases of accidental complexity

The most important thing when managing a multi-year software project is how to avoid getting crushed under the weight of its complexity. The first thing to do about this is learning to differentiate between accidental complexity and essential complexity so that you know broadly where the limits of what can be simplified are. Next is learning about the nature of accidental complexity, how it arises, how it multiplies, and how to keep it in check.

Accidental complexity is inevitable in software stacks where you don't control every layer of the stack. This is because anything that is missing or done badly at a lower level of the stack is a complexity multiplier for the layers above it. Let's see some examples of that, at various levels of the stack, before we get into some practical solutions.

The Internet itself

Because IPs are 32bit, there's not enough of them to uniquely identify everybody on the Internet at the same time. Because of this limitation, a hack called NAT was invented on top in order to get everybody on the Internet fast and cheap (cheap because NAT saves routers a lot of memory). This happened so fast that there was no time or incentive to change the protocol. From there the situation only got worse. Because NAT didn't actually solve the problem of two random people connecting to each other directly (notably, the telephone system doesn't have this problem), the entire communication industry had to adapt to this limitation to the point that today we can't even imagine how this could be different. So today, instead of having say a chat program that everybody in the world can download for free and use it to communicate privately and securely with everybody else directly, what we think is normal instead is having giant corporations relaying all communication between everybody, scraping it for keywords to increase the value of their targeted ads, and then fixing that with senate hearings and bullshit cookie laws, all of which could've been easily avoided for the cost of reserving another 32bit in the IP header before it was too late.

The Web

Because the web didn't have a sane layouting model in the first 15 years of its life, an enormous amount of knowledge was created in the form of frontend web frameworks, libraries, tutorials, blog articles etc. that were thrown into the bin as soon as flexbox and css grid became available. The irony is that there were probably many programmers who could have implemented a layouting model/algorithm in all that time if given the chance, but the web as a platform is just not open in that way (the way the desktop as a platform is for instance) so for all the important stuff everybody has to wait for a small group of people at Google and Mozilla to implement these things in their slow, bureaucratic way.

The situation is similar when it comes to the programming capabilities of the platform. You have to wait for years until you can get trivial things into the platform like a hash map because you can't just plug in your own implementation, or even a blessed one from the browser vendor itself. You can have the browser render your own fonts (even that took years), but not execute your (or even their) code for a a hash map, the platform is just not open in that way. All of this results in a lot of accidental complexity the JavaScript libraries, frameworks and web applications that sit on top.

Linux Distros

Linux distributions are one horrific example of the effects of accidental complexity simply because of the millions of man-hours wasted every year on a task that should be completely automated: building and updating software packages. Did you notice how on Windows, if you want to get a program on your computer you just go to the company or person's website, download it and run it and it will just work almost all the time? Microsoft is not involved in any of that. They spend no time and no money on that problem at all. On Linux, if you want to put your program out there for people to use, you have to either learn how to build, package and publish it for every Linux distro that you care about, ask the distro maintainers to do it for you, or publish the source code with a makefile and dump it all on the user. It's ironic that Linux runs on the app store model that open-source people hate so much, while Windows, of all things, is the only platform that puts users and developers in direct contact with each other without the middle man, and it's been doing that since 1995.

QUIC, HTTP/2, HTTP/3

HTTP/2, i.e. multiplexing streams over TCP, what could go wrong? OpenVPN programmers might know something about that :) Google programmers apparently didn't. OK so that was an engineering mistake, but the protocol was motivated by a limitation at the lower level of the stack, namely TCP, they just didn't realize it at the time. After they had their d'oh moment with HTTP/2, the solution they came up with, namely HTTP/3 is basically HTTP/2 over QUIC over UDP, so in the end the problem was solved at the right level of the stack by replacing TCP with QUIC. But the point is this: if UDP (which is a thin abstraction over how the Internet actually works) would not have been available to applications to allow them to reimplement TCP on top of it without depending on OS vendors (which is what you'd have to do if you'd want to evolve TCP itself), fixing it at the HTTP level would've probably been much more complex and inefficient, or impossible, even.

Async APIs

Because the X Window System API is asynchronous, there are UI behavior patterns that are simply not possible to implement reliably, like for instance "sticky child windows", i.e. have one window follow another when the second one is moved.

In general, an async API is a complexity multiplier, and sometimes it's even worse.

The Solution

The solution to all of this should become apparent by now: control as many layers of your software stack as possible so that you can solve your problems at the right level of abstraction. Stop using application frameworks and create your own abstraction layers instead that are tailored to the needs of your application. The more of the stack you control, the less opportunities for accidental complexity to develop. You don't have to go all the way down for this to work -- at the lowest levels, the knowledge required becomes too specialized. You probably wouldn't program a new filesystem into your OS or add a new optimization pass to your compiler even if you have the code to those things and you could actually deploy it. But things are more-less settled at that level (don't get me wrong -- there's still a lot to do there) plus your OS can't afford to get too much in the way of the hardware because that would slow it down, so people are kinda forced to do the right thing at that level. On the other hand, creating a new DSL to solve a problem or even an entire programming language is something to consider.

capr / blag