Supporting initializers that can throw errors

It's been Chapel's intention to support initializers that can throw errors, yet we've never implemented them. While reviewing library modules recently, we've found ourselves wanting the feature. This issue is intended to capture that desire.

One of the main challenges that we've discussed in the past is keeping track of where in the chain of field initializations the throw occurs so that those field initializers can be undone, but not the ones that haven't executed yet. For example, given:

proc init() throws {
  this.field1 = x;
  this.field2 = y;
  if (...) then
    throw new Error(...);
  this.field3 = z;
}

the concept would be that if the error was thrown, the initializations of field1 and field2 would need to be undone, but not field3 because it hasn't been initialized yet.

Recently we've been discussing a few simplifying cases that we might consider before tackling the above general case, such as:

what if the init() only threw after the explicit or implicit this.complete() had occurred?
what if any throwing were to be done in the postinit() method instead? (though that would cause it to apply to all initializer on the type) The idea in both cases being that we'd know that all fields had been at least default-initialized, so there'd be no need to keep track of what work needed to be undone.

With a quick check, a week or two ago, I found that if the compiler's error about initailizers not supporting throws is removed that init() can be made to throw; and that with no compiler changes, postinit() can throw. ~What I didn't realize until today is that (it seems) such errors are always reported as being uncaught, even if the call to new is within a try...catch block. My guess is that the compiler introduces a helper function to invoke the initializer and postinit, and that this function isn't properly propagating the error back to the user code. I.e., that such a helper function ought to be marked throws if either the init or postinit throws (assuming my guess is correct).~ [edit: with another check today, the previous sentence didn't seem to be true, so I must've messed something up before? Unless I did today].

I'm also guessing that even though such initializers can throw, they probably don't do any cleanup (e.g., delete the class instance for a class; undo the setting of the fields if anything is needed there).

Thinking about this a bit more, though, I also find myself wrestling with semantic questions, such as:

what does it mean to undo a field initialization given that we don't have a general recipe for reversing the consequences of an arbitrary initialization expression?
even with the throw occuring after all the fields are initialized, that doesn't seem to necessarily mean that deinit() can be called, since there may be additional semantic computation in the init() or postinit() that is required for the deinit() to be valid? Or, as the class author, is it my job to track such information in the object and write my deinit() to be cognizant of whether or not the initializer completed?

There are almost certainly lessons to be learned here from other languages—I haven't gone looking to try to learn about it yet, though, and mostly wanted to capture my experiences with experimenting with throwing initializers.

First: dynamic exception handling is a seriously bad unsustainable idea and everything to get rid of it is a good idea. Many languages have this and it is rubbish in almost all of them. C's machinery is correct. No exceptions other than program termination. Felix also has no exceptions.

The theoretical reason for this rubbish is that subroutines are the wrong idea too. The correct unit of modularity is the routine, NOT the subroutine. Routines never return anything. A routine can be passed a continuation by the caller, which it invokes, and this then makes it a subroutine. If it has two possible outcomes, it must be passed two continuations, and invokes the appropriate one. This means "normal" and "abnormal" exits are handled exactly the same way and there is no need for exception handling rubbish. In functional systems including C, another method is to return a variant. This is suboptimal because it only delays the decision of what to do, and forces a complex passing back of information up the call chain until the variant can be handled correctly. This is bad in C, because the information can be ignored. In most FPL such as Ocaml and Felix, return values cannot be implicitly ignored.

Second: Undoing partially constructed objects is a special case of unwinding the machine stack during execution of a sequence of variable initialisations. In general this undoing can itself cause an exception to the thrown which is known as a double fault. At this point, program termination is the only option. That this is possible is another argument that exception handling is rubbish.

Third: In C++ the class name is used for the constructor (which is extremely bad), and you can overload constructors. But this is very poor because there can be more than one way to construct an object, which has the same arguments. The obvious example is complex numbers, which could have two real arguments either cartesian or polar. Its bad design. As is Chapel for the same reason. Constructors should have names assigned by the user.

Fourth: in C++, there are equivalents of init, postinit, deinit but also postdeinit which Chapel doesn't have. These are as follows:

init corresponds to field initialisation in constructor member initialisation syntax
postinit corresponds to initialisation in the constructor body
deinit in C++ has two components: 3a. User code in the destructor body 3b. Field destruction

C++ tracks field initialisation as if the fields were just being pushed onto the stack so it knows which fields have been initialised and which have not been. If an exception is thrown initialising a field, it is assumed that field is not initialised. However C++ allows a try catch for constructors, here is a reference:

https://en.cppreference.com/w/cpp/language/function-try-block

As I understand it this does NOT disturb the undoing of the partial initialisation, but allows for a different exception than thrown to be propagated instead of the original one.

Fourth: one of the key problems early in C++ was the problem of initialising a local variable with a pointer to heap object, only for an exception to the thrown a bit later. In this case unwinding the machine stack just ignores the pointer variable because it has no destructor. Therefore it causes a memory leak. At the time, there was no way around this in the proposed Standard and Australia .. that's me .. said we would veto the Standard unless the Standard Library contained a work around. I proposed along with a friend a simple solution, auto_ptr which is a class that accepts a pointer and on destruction deletes it, thereby cleaning up the machine stack. [The committee subsequently made a complete mess of the design and eventually removed it but that's another story] Although this did not prevent memory leaks it did provide a way for the user to ensure they didn't happen.

In C++ there is very strong support for stack protocol, and an idiom called RAII, Resource Acquisition is Initialisation in which constructors are used to grab resources: not only memory but file handles, locks, etc. In this case, destructors release the resources, and exceptions must ensure correct ordering of destructor elaboration because of coupling of resource acquisitions, eg acquire a file handle, lock it, do stuff, release the lock, then close the file. Unfortunately not all resource controls follow stack protocol but quite a lot do. I personally think this is a very bad idea. The ONLY resource constructors and destructors should manage is memory IMHO, but than, I do a lot of working in FPLs with garbage collectors and the sequencing and timing of collection of garbage is not determinate, and the only resource where this doesn't matter is memory. In addition stack protocol is basically archaic and of little interest in modern programming, and Chapel is a good example, where massively concurrent operations on distributed systems occurs. The stack model has little place in such an environment (Hoare's Communicating Sequential Processes is a much better model).

In any case some answers to Brad's question:

You always have a way to deconstruct anything that needed active construction no matter how complex it is. Deconstruction is broken into two parts: (a) the part the system knows about and (b) the bit you know about because you wrote the code in the constructor (in either init or postinit.) Construction always reduces to a sequence of operations which can be undone in reverse order. Therefore, in case of partial construction and error, undoing is always possible. The principle difficulty already mentioned is that the undoing operations can also trigger an exception.
If deinit is called, the author can and must assume the whole object has been fully constructed which means postinit is done and all the magic needed for the construction attempt to return control and the object pointer to the user. You don't have to track where the initialisation is up to, and you cannot anyhow.

We already have #6145 and #8692 about the need to have initializers that can throw, so I'm linking them here.

Thinking about this a bit more, though, I also find myself wrestling with semantic questions, such as:

what does it mean to undo a field initialization given that we don't have a general recipe for reversing the consequences of an arbitrary initialization expression?

I don't think we are un-doing the field initialization; we are balancing it with field deinitialization. In the same way that if you have this in some function:

  {
     var x = complicatedFunctionWithSideEffectsCreatingRecord();
     throw new Error("some error");
  }

We don't try to un-do complicatedFunctionWithSideEffectsCreatingRecord. We simply deinit the record.

even with the throw occuring after all the fields are initialized, that doesn't seem to necessarily mean that deinit() can be called, since there may be additional semantic computation in the init() or postinit() that is required for the deinit() to be valid? Or, as the class author, is it my job to track such information in the object and write my deinit() to be cognizant of whether or not the initializer completed?

I think that the record/class author needs to understand that having throwing initializers can lead to partially constructed objects being deinited. However they have some control over this if how to handle the situation is not obvious (e.g. by creating a field that is set first to indicate the instance is partially constructed and then set after this.complete or in postinit to indicate construction was successful). I think that most of the types I have written would have an obvious way to handle this (without adding new fields) and it wouldn't be a big deal.

We already have https://github.com/chapel-lang/chapel/issues/6145 and https://github.com/chapel-lang/chapel/issues/8692 about the need to have initializers that can throw, so I'm linking them here.

Shoot, why am I so bad at searching on existing GitHub issues? I searched on a variety of "initializers" vs "inits" + "throw" vs. "throws" to make sure this didn't already exist and somehow missed both of those. :'(

First: dynamic exception handling is a seriously bad unsustainable idea and everything to get rid of it is a good idea.

@skaller: I'm not sure how to say this constructively, but here's my best attempt: Jumping onto an issue about a given feature and starting with a comment says "You should throw this feature out and start again the way I would do it" is not particularly constructive. It's definitely the quickest way to get me to stop reading and skip on to the next comment. Chapel does have a concept of throwing errors today, it was vigorously requested by the user community, and extending the feature to support contexts that it doesn't currently is appropriate when it's desired by users and developers (as in this case).

If you'd like to make a constructive proposal along the lines of "Chapel should get rid of throwing errors and do [something else] instead, since it provides similar benefits yet is better in the following ways" you're welcome to do so, but please do so using a fresh issue rather than hijacking an issue that happens to mention the feature yet is focused on something more specific. Jumping onto every issue entitled "How should we extend x to do y?" in order to say "Just get rid of x everywhere!" isn't particularly effective or constructive. Specifically, it seems oblivious to the fact that Chapel is past the point of throwing out core language features on a whim and starting from scratch due to having existing users who have code that needs to continue working. We're not in the early days of blank slate design anymore, and most of these issues aren't meant to be academic debates about the merits of a given feature set, but rather focused discussions on solving a specific problem.

First: dynamic exception handling is a seriously bad unsustainable idea and everything to get rid of it is a good idea.

@skaller: I'm not sure how to say this constructively, but here's my best attempt: Jumping onto an issue about a given feature and starting with a comment says "You should throw this feature out and start again the way I would do it" is not particularly constructive.

I didn't say that at all. I started with a theoretical observation. When you're analysing something, it helps to understand the abstract structure behind it .. the algebra .. before working on concrete implementations.

In C++ for example, which has exception handling, originally a lot of work was put into specifying what could and could not be thrown. There was a whole sub-syntax for it. This was a popular idea at the time (not just in C++). The problem was, when you add polymorphism, you no longer know exactly what functions get executed, and to cope with that, you have to add polymorphism to the exception specifications as well. In practice this turned out to be intractable and modern C++ has thrown out exception specifications, so now you can just say throws or doesn't throw (which is the same as Chapel isn't it?)

I point out a lot of problems in all languages that have with what I call dynamic exception handling causes, and I'm trying to explain how this arises. It arises because the whole idea of subroutines as a fundamental building block is wrong. Chapel uses them Haskel uses them, Ocaml uses them, C++ uses them. Felix uses them too, although it is technically coroutine based.

So the point is to understand at a fundamental level how this problem arises. And I'm claiming it arises because the notion of a subroutine with a single return value and a single control path it can return to is plain wrong in the first place as a basis for computing. In theoretical models, an alternate technique for modelling computations exists, which is called CPS: continuation passing style, which does not have this problem. In this model a subroutine is a routine which is passed the current continuation by the caller (which is just the return address on the stack) and can invoke the continuation by a return statement, which resumes the caller. The point is, the model then allows you to pass multiple continuations. This isn't my idea and it isn't what I do at all in Felix. It's a theoretical model for understanding computing.

What exception handling does is it allows you to establish a continuation (with a try/catch block), which responds to a message somehow thrown in the try block, to invoke the continuation specified in the catch block. That continuation is a subroutine, so you have to eventually continue on up stack or terminate. The problem is the throwing and catching, which unwinds the stack, is unprincipled. There is no theoretical basis for it at all. In general the coupling cannot be statically checked. You know about that: uncaught exception error. And another one that arises, in practice or further analysis, is the double fault. An exception whilst unwinding the stack, before the exception being thrown is caught.

The best solution from a language design point of view when you find you've implemented something without understanding the algebra properly is to contain it, IMHO. If you're doing a new language, you try to avoid it. Golang, for example, just has one exception: panic. Ocaml, on the other hand, allows local exceptions now, which allow static checking that ensures you have handled the exception.

So just FYI, when I'm writing C++ these days, I completely ignore exceptions. I don't throw them, and I don't catch them. If memory allocation fails, my program terminates. I don't write exception safe code, and I don't throw exceptions. If I'm using a library that does throw, then I try to contain the design fault, typically by using a wrapper, if I bother at all.

I think the point to take away here is the original specification by Bjarne in C++ was that exceptions were design to gracefully terminate your program, reporting the kind of error that caused them, instead of a core dump. They were never intended to provide any kind of alternate control flow, or solve the really serious theoretical problem that sometimes a routine needs to have multiple return points. The problem is people started using them for handling so-called exceptional cases, which in fact they were never intended to handle in the first place (in C++).

There's another very serious issue here which is especially important in Chapel. Dynamic exception handling is a stack protocol based concept. Throw an exception, unwind the stack till you find the handler, destroying objects on the way up the stack. The problem is, Chapel is NOT a stack machine! Chapel distributes computations so many run concurrently. Its splits up data so the concurrent computations get what they need locally. how in the heck does one of these subcomputations throw an exception which resumes execution on an up stack handler when you have thousands of stacks on multiple distributed computations?????

The theoretical model called Communicating Sequential Processes (CSP) invented by Tony Hoare in 1978 I think, which golang and Felix both support, is much more appropriate than a stack machine for Chapel, because Chapel really does run thousands of concurrent processes. When you have channels to communicate, channel I/O can easily be used to handle multiple possible results from a computation.

So one possible take away from these observation is that it doesn't matter what you do when an exception is thrown. Your subcomputation is screwed anyhow. The process should terminate, which will free up resources automatically. You don't need to recover, because the idea is to gracefully terminate: in Chapel that would probably mean sending a message on a channel to a process supervisor or something. EH simply doesn't scale and Chapel requires things to scale. So there are two choices I think:

Terminate without bothering with cleanup
Contain the exception tightly

or both. The best way I can help with detail discussions at the moment, not actually knowing the details, is to challenge the basis on which the issue arises in the first place. Why exactly does Chapel have exception handling?

Why exactly does Chapel have exception handling?

That's a reasonable question, I just don't think that this is the right place to be asking it.

Why exactly does Chapel have exception handling?

That's a reasonable question, I just don't think that this is the right place to be asking it.

But understanding the intent is needed in order to decide how to resolve issues. If the intent is simply to gracefully terminate, then the resolution of issues may be different to that which would follow from a desire to allow an alternate normal control flow path out of a function.

For example, if you just want graceful termination of the program, with good error diagnostics, then the sole purpose of catch is to chain error diagnostics from the point of error up the call stack to the top level (as you can see, for example, with Python backtraces, which give you some idea of the control path followed to get to the fault). In particular with this intent, cleaning up resources is not necessary because the OS will clean them up anyhow when the process terminates.

On the other hand if you intend alternate control flow, so that a catch clause will actually allow the program to continue normally, you have to clean up properly, destroying unreachable resources which would otherwise leak. This obviously requires a lot more detailed work by the compiler, and more sophisticated language constructions: note that in C++, although the original intent was merely graceful termination, the actual semantics are based on being able to recover and continue, and this lead to being able to wrap a function in a try catch block directly, including constructors, because there was no other way to actually catch errors in ctor-initialisers, and, for that matter, in things like copy constructors invoked from a return statement (C++ syntax simply didn't have a place to put the try catch block that trapped errors in code the compiler generated itself).

Furthermore, if you happen to take the point that EH is a bad idea in the first place, but you do have it, then the way you'd cope with an issue may be different if you're planning to phase EH out and replace it with something else. Maybe it isn't worth the detail effort in the compiler, to track the initialisation of every individual function parameter or class variable just so you can undo the effects, and maybe there's a way to provide the user a way to do that, instead of making the compiler do it, if the user actually cares. This happened in C++ with autoptr. The developers, principally ME, didn't care if a pointer on the stack leaked, so long as the library had a a smart pointer class that allowed the user to capture the pointer and delete it during stack unwinding. The compiler still has to some some work, but it no longer has to worry about a raw pointer on the stack.

chapel-lang / chapel

Supporting initializers that can throw errors #19179