Please reconsider having Undefined Behaviour

Hello,

I am a long-time C++ developer and would like to propose that the Carbon project reconsider its handling of undefined behaviour. That is, I think it would be best if what is currently undefined behaviour be demoted to platform-specific behaviour. If optimisations based on said semantics are desired for certain hot spots in the program, the programmer should be required to first declare which semantics she wants, and then add debug-time assertions on the values of variables that guarantee undefined behaviour is not triggered. First I would like to give an example of the kind of problem undefined behaviour causes in large codebases.

Undefined behaviour is problematic, because optimisations based on the semantics of UB can have surprising non-local effects that are hard to reason about for both humans and tools performing static code analysis. As one example, consider the following C++ pseudo-code:


void logMessage(int level, char *fmt, ...);

class Foo {
public:
  void doSomething();
};

/// Almost never fails. Returns nullptr on failure.
Foo *createFoo();

/// @param x The Foo (may be nullptr).
void disposeFoo(Foo *x);

inline void doWithFoo(Foo *x) {
  if (x == nullptr) {
    logMessage(LOG_ERROR, "Logic error: null argument");
  } else {
    x->doSomething();
  }
}

// ================================================================
// in another translation unit
// ================================================================

void veryLongFunction() {
  Foo *widget = createFoo();
  ...
  doWithFoo(widget);
  ...
  disposeFoo(widget);
}

This is completely safe. However, some time later, a developer changes the parameter of disposeFoo from a pointer to a reference and mechanically (via her IDE's refactor feature) replaces all callsites to dereference their argument.

/// @param x The Foo
void disposeFoo(Foo &x);

void veryLongFunction() {
  ...
  disposeFoo(*widget);
}

Now the compiler does the following optimisations:

doWithFoo is inlined in veryLongFunction
Since widget is dereferenced unconditionally, all branches where widget is nullptr can be deleted.

This results in a catastrophe. The compiler has removed the safety check that would result in an error log when the very rare case occurs that createFoo returns nullptr. The result is a bug report with a backtrace showing a segmentation fault in Foo::doSomething (since this is nullptr), which is impossible to reproduce in a debug build and is logically impossible from reading the code, because doSomething is obviously only called when x is not nullptr. After several hours of lost productivity, the developer finally realises what is happening, when she reads the disassembly of the optimised build.

In a large codebase with more than one developer such cases lurk everywhere. Similar scenarios happen with integer overflow, strict aliasing, unaligned reads, etc. Just the fact that the most popular C++ compilers have flags to define the most impactful cases of undefined behaviour should be indication enough that having undefined behaviour is more trouble than it's worth.

In general my proposal is that the compiler must not do value-dependent optimisations unless it can prove them safe via static analysis, or if there is a (debug-build-)assertion marking them as safe. Operations on values that are currently undefined behaviour should be loosely defined per-platform as whatever is most performant on that platform, or even specified as one of several possibilities with each implementation choosing what is most suitable for it. For sections of code that would benefit from value-dependent optimisation, there should be an annotation to request them and some form of assert to check said values in a debug (and maybe release) build.

I would like to finish with a link to Dennis Ritchie's letter to the comp.lang.c newsgroup from 1988 regarding a proposed "noalias" keyword in C (quite similar to __restrict, but stricter), foreseeing much the same problems we are seeing today with undefined behaviour: https://www.lysator.liu.se/c/dmr-on-noalias.html .

C wasn't meant to have undefined behaviour. I think with Carbon we can remedy this historical mistake and have a reasonable, fast, high-level, close-to-the-metal, systems programming language.

Undefined behaviour is problematic, because optimisations based on the semantics of UB can have surprising non-local effects that are hard to reason about for both humans and tools performing static code analysis.

My 5 cents:

This is a difficult topic, the D programming language made signed int a modular type that wraps in order to avoid undefined behaviour. I view this as a design mistake because this makes it more difficult for static analysis, it leads to a situation where you cannot reason about monotonic recursive functions, or anything else that is increasing in value, as an increase of a value can lead to a lower value legally.

"Undefined behaviour" should really just be things that are too expensive to type check (at compile time or runtime). The question is not if you will have "undefined behaviour" for a C-like language, but how the optimizer propagate assumptions.

For instance, you could turn "asserts" into assumptions, as breaking an assert is something that is defined to be incorrect for that program, but that would be really dangerous as the user most likely wrote the assert because s/he thought that the library user is likely to make that mistake. You could still do it, but then you would at least need to make sure that the assumption does not propagate outside that function or above the assert. How do you limit propagation of assumptions when you have inlining, subexpression elimination etc? It becomes very difficult to write static analysis if you also need to localize some facts (assumptions) and anything derived from them.

I somehow doubt that you would be able to use existing backends if you really try hard to avoid propagation of facts in ways that can be problematic.

You certainly can limit what the optimizer views as a fact, but are you willing to prevent the optimizer from making the assumption that "i+1 > i" for ints? Because that is really what you are asking for?

I personally find myself advocating on the other side: anything remotely nonsensical (integer overflow... regardless of signedness, division by zero, null dereference, etc) should all be undefined, because it helps catch bugs. It's also has optimization benefit, but primarily because it can catch bugs.

This is a bit strange but the logic is that if UB ever occurs you know it's definitely a bug and so you can have build modes that catch it. If you define the behavior, you are promising it behaves in a particular way and it's no longer clear if the code as written is a bug. Maybe you meant to divide by zero to force an exception to be thrown?

This is predicated on the idea that we have a build mode that catches as much UB as possible. It's also predicated on having decent test coverage to exercise potential bugs. But i think we can and are planning to have a UB-catching build mode, and designing assuming decent test coverage is the only reasonable path forward: if you don't have good test coverage, UB is the least of your worries.

Just to echo @asoffer's point: I too find UB to be rather valuable. The less a particular construction is specified to do, the less a reader needs to consider what it's doing, and the clearer it is to tell intention from accident. Code is primarily about documenting intent to other humans (and only secondarily for the machine), and reading code is always an act of forensically reconstructing the author's intention from the lossy and imperfect medium. The easier the medium makes it to tell intent from accident, the more likely that reconstruction is to succeed.

In other words, a construction that's only specified to do one thing can be assumed to want to that one thing. By contrast, a construction that has all behaviours specified must be assumed to want to do any of those behaviours, and the reader needs to figure out whether the behaviour was intended or a) is a bug, and b) need to be preserved or can be changed.

It's still valuable of course to have some way to detect undefined behaviour, but detecting it should always point to a bug, and never result from an intentional expression. That is, it would be nice if the impact of UB could be bounded (at least in some build modes), but such bounds should not be part of any particular construction's specification.

(The same argument applies to "always initialize variables" or "implicitly initialize variables to some special value".)

"Undefined behaviour" isn't valuable. No proper high level language should have "undefined behaviour". It is just unavoidable if you want the same performance as C/C++ in system level programming. It is a consequence of not being willing to inject runtime type-checks.

I can see your points, still I would argue against adopting C's undefined behaviour as it currently is.

My argument stems from experience and is based on two main points.

1) Having undefined behaviour makes the program unreasonable. That is, a programmer cannot reason about the correctness of her code, because that amounts to solving the halting problem. Most of us can fully predict the behaviour of small chunks of code, but not of an entire program. This means that a language with undefined behaviour is unsuitable to writing a program larger than what one programmer can fit in her head.

2) In a language that compiles to machine code and executes on bare metal, it makes no sense for the semantics of operations to differ from the semantics of the machine code produced when compiling these operations. For example, the semantics of the processor's "add" operation are completely defined and having the semantics of the language's "+" operator differ from those results in a leaky abstraction and leads to all the problems outlined above. Farthermore, having the semantics of the language differ from the semantics of the instruction set means that the latter are not accessible to the programmer.

I obviously don't want to completely ban optimisations dependent on undefined behaviour. Those are extremely valuable for the hot loops in one's code. I am arguing simply that the default behaviour should be fully defined and there should be no way to turn on data-range-dependent optimisations globally, but only for select blocks.

As a last point, if Carbon could alleviate the pain of having to wrangle C's undefined behaviour, while delivering 99% of the performance, that would be a killer feature spurring its adoption. If it farthermore allowed for the specification of preconditions (as asserts) that enable a rich set of optimisations that currently rely on undefined behaviour, it would allow us to reap even higher performance in the hottest parts of our code, while remaining sure that the semantics of the program match what we understand them to be.

Having undefined behaviour makes the program unreasonable. That is, a programmer cannot reason about the correctness of her code, because that amounts to solving the halting problem.

I actually see undefined behavior as a valuable tool for reasoning about code, rather than an obstacle to reasoning about it. When I'm reading C++ code and I see int x = arr[i];, I can immediately assume that x will hold the value of some element of arr, or else the program has a bug, but that's because arr[i] has undefined behavior when i is less than 0 or greater than the array size. If that indexing operation had "platform-specific behaviour" instead, I would have to consider a third possibility: that the code is intentionally and correctly relying on whatever the "platform-specific behavior" is. That same problem affects every operation that has a precondition, and this creates a combinatorial explosion of possibilities as you try to reason about larger bodies of code.

On the other hand, I agree that undefined behavior can sometimes make it harder to reason about program behavior in the presence of bugs, as in your original example. I wonder if it's possible to address problems like that, while still retaining the benefits of undefined behavior, for example by trying to specify it in a way that rules out that sort of "time travel".

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time. This issue is labeled inactive because the last activity was over 90 days ago.

carbon-language / carbon-lang

Please reconsider having Undefined Behaviour #1956