Implement Defer - Githubissues

lerno commented 6 years ago

Implement Rust/Zig/Jai style defer.

luciusmagn commented 6 years ago

We have a defer in Rust? Never seen one of those before and I am pretty big on Rust :sweat_smile:

Can you show me? I am genuinely interested :smiley:

lerno commented 6 years ago

Well, strictly speaking it doesn’t exist in the language. However, there is scoperguard and a few other implementations of it. It’s also fairly trivial to implement it in C++ thanks to RAII.

(Neither Zig nor Jai has RAII and consequently implement it as a keyword)

What all the solutions have in common is that defer occurs at scope exit.

Go is unique in that it always occurs on function exit. This means that defer cannot be modelled as a goto or inline code.

bvdberg commented 6 years ago

There are 2 open-issues (maybe more) that I can think of:

How does defer work with goto?
How does defer work with longjmp?

I think the goto can be handled, but the longjmp cannot. That is also possible, just specify a longjmp will skip defers.

lerno commented 6 years ago

Goto: depends whether we have the jump tables or not. If goto is explicit, then we know when the current scope exits.
Longjmp: We can't know exactly where that will exit, so the more reasonable assumption would actually more be that longjmp executes all defers.

For the AST, maybe this should be more explicit. If we have this:

{ 
  a();
  defer foo();
  b();
  c();
}

Instead of the naive version of:

 - CompoundedStmt
 -- a()
 -- DeferStmt
 --- foo()
 -- b()
 -- c()

We should perhaps have:

 - CompoundedStmt
 -- a()
 -- DeferCompoundedStmt
 --- foo()
 --- CompoundedStmt
 ---- b()
 ---- c()

That might make analysis simpler.

If we consider a goto:

{
  label1: 
  a();
  label2: 
  defer foo();
  label3: 
  b();
  goto ... // some label
  c();
  {
    label4:
    d();
  }
}
label5:
e();

 - CompoundedStmt
 -- label1:
 -- a()
 -- label2:
 -- DeferCompoundedStmt
 --- label3:
 --- foo()
 --- CompoundedStmt
 ---- b()
 ---- goto ...
 ---- c()
 ---- CompoundedStmt
 ----- label4:
 ----- d()
 - label5:
 - e()

In this code it's obvious that if the goto does to nesting deeper that DeferCompundedStmt (label3, label4) nothing happens. But if there is a goto to less deep nesting - or a different tree (label1, label2, label5) we need to insert the defer code (if that is done by inlining or a "gosub" jump does not matter)

Break, continue and return analysis should be possible to do in the exact same way (break/continue = goto nesting level of switch/while/for/do, return => goto nesting 0)

bvdberg commented 6 years ago

For longjmp() that's just impossible, or you have to unroll stacks like exception handling. For a single function it can work, but if the jump crosses multiple calls (and you never know which), it's just not possible. Not a problem per se. I think defer is an option to developers, if they don't want to use it, fine. There should be no runtime penalty whatsoever if you choose not to use a defer.

lerno commented 6 years ago

For longjmp we assume it works like return.

But there is one more consideration: should we implement it with jumps or inlining?

If we look at the C generation, consider this:

{
  FILE *f = open();
  defer close(f);
  i32 x = a();
  if (x == 0) return false;
  b();
}

Inlining:

{
  FILE *f = open();
  i32 x = a();
  if (x == 0) {
    close(f);
    return false;
  b();
  close(f);
}

Static inline functions:

static inline __defer_1(Foo **x) {
   close(*x);
}
...
{
  FILE *f = open();
  i32 x = a();
  if (x == 0) {
    __defer_1(&f);
    return false;
  b();
  __defer_1(&f);
}

Jump tables:

{
  FILE *f = open();
  void *__after_defer_1_return;
  goto __after_defer_1;
  __defer_1:
  close(f);
  goto __after_defer_1_return;
  i32 x = a();
  if (x == 0) {
    __after_defer_1_return = &&exit_defer_1_1;
    goto __defer_1;
    exit_defer_1_1:
    return false;
  b();
  __after_defer_1_return = &&exit_defer_1_2;
  goto __defer_2;
  exit_defer_1_2:
  __defer_1(&f);
}

Switch:

{
  int __defer_state = 0;
  int __defer_exit_state = -1;
  bool __res_return;
  while (__defer_state >= 0) {
    switch (__defer_state) {
      case 0: {
        FILE *f = open();
        i32 x = a();
        if (x == 0) {
          __res_return = false;
          __defer_exit_state = 2;
          __defer_state = 1;
          break;
        }
        b();
        __defer_exit_state = -1;
        __defer_state = 1;
        break;
      case 1:
        close(f);
        __defer_state = __defer_exit_state;
        break;
      case 2:
        return __res_return;
    }
  }

Of course, we can say "we pick a solution depending on the size of the the defer" but that creates a lot of magic, so I'm thinking that inlining might be the correct solution here, even though that could be a lot of inlining. With inlining there will be quite a difference between:

{
  FILE *f = open();
  defer close(f);
  switch (x) { 
    case 0: return ...
    case 1: return ...
    ...
}

and

{
  FILE *f = open();
  defer close(f);
  int res;
  switch (x) { 
    case 0: res = ....; break;
    case 1: res = ....; break;
    ...
  }
  return res;
}

The first one will have defer inlined at every point in the code. While the second will only have it inlined once (before "return res")

But it's easy to explain: "defer will inline the defer statement at every exit point from the current scope except for longjmp" and it's very predictable.

luciusmagn commented 6 years ago

Speaking of goto and longjmp in contrast to safety with defers, we are reaching the broader question of the safety and utility of goto and jumps in general. As our man Edsger Dijkstra told us in '68, Go To Statement [is] Considered Harmful.

With its ability to break practically any control structure, we can't really ensure any code that uses goto is safe, and like you said, longjmp-ing code is even more complicated. Maybe it might not be a bad idea explicitly flag goto as unsafe. With our current control structures, we mostly cover it's use cases and if we were to introduce something like named loops and blocks, we might be able to delegate goto as an unsafe tool for operating systems/very low-level programmers, who probably know what they are doing anyway.

For reference, named loops which I am talking about look in Rust like this:

// 'outer is the name of the outer loop
'outer: for x in 0..5 {
    // the name is only visible here
    'inner: for y in 0..5 {
        println!("{},{}", x, y);

        if y == 3 {
            break 'outer;
            // or one could possible continue 'outer as well
        }
    }
}

And named blocks (only recently stabilized) can do this

fn my_function(/*...*/) -> ReturnType {
    'my_block: {
        // some code, which may fail
        // and when it fails, we just break the block
        break 'my_block;
        // all here code skipped
    }
    // | cleanup/error-handling code down here
    // v
    do_something_to_cleanup();
    some_other_cleanup();
    result
}

This supplements the goto ON_ERROR; pattern in a clearer and more restrictive manner.

lerno commented 6 years ago

Labled break has been in Java forever. Legitimate uses of it I’ve seen of java code using it (where there are no better solutions): 0 (and I’ve worked with java quite a lot).

The C usages were all subsumed by exceptions.

Defer handles all error cleanup. That together with better error handling (discussed in the forum) should eliminate almost all legitimate cases in C too.

That seems like an argument for eliminating goto, but I instead think it should be kept. - Even though I know new languages are planning to eliminate it (e.g. Zig)

I also think that it should be possible to get the pointer to a label (either using assembler or ”&” - it’s not important). Not necessarily that goto should work on that, but so that one would be able to use it for writing assembler code that does jumps using the labels. This is great for writing fast VMs.

lerno commented 6 years ago

But the more burning question is whether defer should inline, ”inline or call” or jump in the default implementation.

I know Nim uses setjmp, but that’s because they use exceptions and transform it to try { ... } finally { ... }

lerno commented 6 years ago

I'm currently implementing it using inline.

lerno commented 6 years ago

I need to make a change to enforce { } even on single statements in the generated C... Otherwise adding defers becomes a pain.

lerno commented 6 years ago

Should we permit gotos across defers?

That makes things rather complex. Consider this:

{
   if (a == 1) goto label1:
   defer close(f);
   label1:
   printf("Hello world\n");
}

Normally the rewrite would be:

{
   if (a == 1) goto label1:
   label1:
   printf("Hello world\n");
   close(f);
}

But here we would have to do a conditional defer if we wanted to support it! This is ok though:

{
   defer close(f);
   if (a == 1) goto label1:
   label1:
   printf("Hello world\n");
   if (a == 2) goto label2:
   label2:
   defer close(f2);
   printf("Hello world\n");
}

Since in both of those cases the defer does not happen between the jump and the label.

lerno commented 6 years ago

There are ways around this but they require extra bookkeeping. Consider this:

Again this:

{
   if (a == 1) goto label1:
   defer close(f);
   label1:
   printf("Hello world\n");
}

If we use bookkeeping it's possible:

{
   bool __defer_1 = false;       
   if (a == 1) goto label1:
   __defer_1 = true;
   label1:
   printf("Hello world\n");
   if (__defer_1) {
     close(f);
   }
}

lerno commented 6 years ago

Another point to improve might be to either allow break or return in a defer, to give allow easy exit. So break/return in defer basically would mean "exit the defer". Any opinions @bvdberg @luciusmagn ?

luciusmagn commented 6 years ago

Like I said, goto's are very fickle :D I believe that we absolutely need this extra bookkeeping, so that we avoid having such a big source of possible UBs. The overhead this adds is neglibible and in the future, we could maybe just detect code that uses goto and only use book-keeping there.

I think your idea about exiting defers isn't bad either, it provides extra versatility. I am more partial to the break keyword, but the word doesn't matter as much.

bvdberg commented 6 years ago

As I see it, defer() is used to cleanup stuff, like doing close() or free(). I can image that a developer might want to iterator over some list (using for/while) inside a defer and maybe do a 'break' to quit the loop. Otherwise, I cannot see any reason why to allow break/return/goto's inside a defer expression..

lerno commented 6 years ago

I will do the following:

Allow goto in a defer as long as it matches a label within the defer itself.
Allow ”break” to exit the defer early.

(2) means this is ok:

defer {
  if (!f) break;
  close(f);
  printf(”File was closed”);
}

luciusmagn commented 6 years ago

2) This is exactly the type of usage I was thinking of. Nice example 1) Sounds fair. What about scoping, though? Should we care if the same label is defined in a defer and in the body of the function?

lerno commented 6 years ago

I haven't thought about scoping. Also, obviously no-one should be able to jump INTO the defer. I need to remember that case :D

lerno commented 6 years ago

C has function scope for labels, so that should be true for the defer as well I suppose. Interestingly, since the defer is inlined, this actually means that each label inside of the defer needs to be unique per inline location.

lerno commented 6 years ago

Should we allow defer in defer? Is it useful?

lerno commented 6 years ago

BTW, if you are interested in how the implementation looks for the defer right now, you can have a look at this branch: https://github.com/lerno/c2compiler/tree/defer See the comments in the commit to see details of what's missing.

lerno commented 6 years ago

I now detect gotos correctly, will continue with getting the booleans done when goto spans a defer. That's the last part + the break function. Plus I need to figure out how to write additional diagnostics.

bvdberg commented 5 years ago

I think we can safely deny any defers inside another defer.

luciusmagn commented 5 years ago

I agree with @bvdberg. Especially since allowing this further down the line would not be a breaking change. Right now, it can be easily said that if someone needs a defer in a defer, their code is probably too complex and should be rewritten.

lerno commented 5 years ago

Functionality is done now. I'm just going to rebase, do sanity checks and push it in a fresh branch. It was a little tricky finding out the right way to do this in a cheap way (+ avoiding any duplicated logic in the C generator), but I found a halfway decent way to do it in the end. Basically I construct a small, linear piece of code with only relevant AST nodes. Then I can go back and forth in that one in a cheap way, without having to worry too much + it's very tiny since only a subset if the nodes are there.

Not being able to rewrite the AST or walk backwards was a serious limitation, which I in practice solved by linearizing the AST and just storing the scope transitions. If done on the whole AST, it would look like:

[COMPOUND_STMT ScopeStart, depth = 0]
[IF, depth = 1]
[IF DECL ScopeStart, depth = 1]
[DECL, depth = 2]
... code for the decl here...
[IF DECL ScopeEnd, depth = 2]
[IF BODY ScopeStart, depth = 1]
... body ...
[IF BODY ScopeEnd, depth = 2]
[GOTO label1, depth = 1]
[A++, depth = 1]
[COMPOUND_STMT ScopeEnd, depth = 1]

If that makes any sense. Basically state changes can be made explicit. Not as easy to transform back to C though. But the question of LLVM / C is something for the future.

bvdberg commented 5 years ago

Nice! I'll await your branch and take a good look then. Did you add any unit tests?

Can't we keep a pointer to a (Compount)Statement in the Scope stack? then we can always go back during generation. But a separate stack with defers would also work I think. I don't think changing the AST into something linear is a good idea, since the recursion is now so powerfull

lerno commented 5 years ago

I have written some tests but not enough. I still need to look at how to create new diagnostics though, so that the error cases have pretty error messages, double check that labels inside of defers automatically work. But it took longer than I expected it to (I've been spending some late nights on it) just to get corner cases right.

Also, the tests I've written should be expanded. There are subtle problems that any solution needs to show it can handle for gotos and switches. For example:

while (foo) {
   defer a();
   while (bar) {
     defer b();
     ... code ...
   }
   ... code ...
   break; // Needs to ignore the "while (bar)" part entirely when looking for defers.
}

And for gotos, properly skipping things in a lower scope... there are a lot of special cases that I've worked my way through...

lerno commented 5 years ago

By the way, the flat AST is something for the distant future, nothing for now.

bvdberg commented 5 years ago

For diagnostics, just add an entry in include/clang/Basic/DiagnosticParseKinds.td or DiagnosticSemaKinds.td. That's all.

On Thu, Nov 15, 2018, 14:38 Christoffer Lerno <notifications@github.com wrote:

I have written some tests but not enough. I still need to look at how to create new diagnostics though, so that the error cases have pretty error messages, double check that labels inside of defers automatically work. But it took longer than I expected it to (I've been spending some late nights on it) just to get corner cases right.

Also, the tests I've written should be expanded. There are subtle problems that any solution needs to show it can handle for gotos and switches. For example:

while (foo) { defer a(); while (bar) { defer b(); ... code ... } ... code ... break; // Needs to ignore the "while (bar)" part entirely when looking for defers. }

And for gotos, properly skipping things in a lower scope... there are a lot of special cases that I've worked my way through...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/c2lang/c2compiler/issues/65#issuecomment-439042602, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC1mv4mxJQObB6WJUQbRceNbawJRjg9ks5uvW48gaJpZM4YXgd- .

lerno commented 5 years ago

Done, and finally, FINALLY it's ready for a pull req. Took quite a long time, but there are so many corner cases tracking down and tagging exits. Writing code to handle 90% of the cases: 3 hours. Writing code to handle 100% of the cases: probably a week of nightly hacking.

lerno commented 5 years ago

Just one more thing @bvdberg regarding why function return defers is bad. Consider this:

Lock *lock = aquire_lock(a);
defer release_lock(lock);
int last = 0;
for (int i = 0; i < len; i++) {
  Entry *entry = a[i];
  Lock *entry_lock = aquire_lock(entry);
  defer release_lock(entry_lock);
  if (is_closed(entry)) continue;
  if (is_opening(entry)) continue:
  update(entry);
  if (entry.last_current) {
    last = i;
  } 
}
for (int i = 0; i < last; i++) {
  Lock *entry_lock = aquire_lock(entry);
  defer release_lock(entry_lock);
  entry.order = last - i;
}

Under Go style defers, we'd get stuck in the second loop. Also, even skipping this, imagine that len is big, and update is slow. We then lock the first elements from usage for a very long time even though we don't need to. In order to fix this we'd have to introduce closures:

Lock *lock = aquire_lock(a);
defer release_lock(lock);
int last = 0;
for (int i = 0; i < len; i++) {
  [&a, i, &last] {
    Entry *entry = a[i];
    Lock *entry_lock = aquire_lock(entry);
    defer release_lock(entry_lock);
    if (is_closed(entry)) continue;
    if (is_opening(entry)) continue:
    update(entry);
    if (entry.last_current) {
      last = i;
    } 
  }();
}
for (int i = 0; i < last; i++) {
  [a, i, last] {
    Entry *entry = a[i];
    Lock *entry_lock = aquire_lock(entry);
    defer release_lock(entry_lock);
    entry.order = last - i;
  }();
}

The Go defer solution consequently incurs a lot of unnecessary heap allocations. Not to mention memory leaks. Go can do this because it has a GC.

lerno commented 5 years ago

And flogging that dead horse, the Go style defers was one of the biggest problems I had with the language. I wrote about it: https://medium.com/@christoffer_99666/golang-first-impressions-deb1323be90e ;)

bvdberg commented 5 years ago

I think defer should mainly be used for situation with lots of possible (early) return paths. In a small loop where all code fits 5-10 lines, developers can just call unlock themselves. A defer is not a mandatory way of freeing resources, just an extra way of doing that. I think using defer for small pieces of code with 0-1 return paths is bad practice. Otherwise, a C++ destructor substitute system would be a lot better. There you cannot forget to defer at all.

lerno commented 5 years ago

Defer is a OO-less way to get RAII. I also think that RAII is a bit problematic since it’s implicit and not clear where the actual destructors are. The examples I gave are only very simple ones, there are definitely times where it makes sense to use defer within a loop and not call unlock manually.

But please put the functional defer out of mind, even with closures they're bad. This is how *GO style defers would look for C++

From this:

func void foo() {
  FILE *f = open("foo.txt");
  defer close(f);
  /* ... code ... */
}

To this:

void foo() {
  std::vector<std::function<void()>> __defers {};
  FILE *f = open("foo.txt");
  __defers.push_back([f] { close(f); });
  /* ... code ... */
  while (__defers.size() > 0) {
    __defers.back()();
    __defers.pop_back();
  }
}

Now consider the amount of runtime needed to implement in C/C2...

Compare the scope style defer (the one I implemented):

func void foo() {
  FILE *f = open("foo.txt");
  /* ... code ... */
  close(f);
}

paulie-g commented 5 years ago

I see this as syntactic sugar for the common pattern of defining an exit section with a label and goto'ing to it. I consider this syntactic sugar valuable - if some deallocatable things may exist, but not must exist, you either have to check (branch mispredicts) or create multiple sections, with fallthrough if you're lucky. It also co-locates creation and destruction in the code, which both encourages good practice, avoids bugs and makes it easy to check.

bvdberg commented 5 years ago

Defers are indeed an OO-less way of RAII. The main usage would be in functions that always 'free/close' a resource and has many return paths. If the resource needs to be available after function end if there are no errors, defer is not usable. I agree with lerno that there should be no runtime code, just inserting the deferred statements at all return paths (and adding tmp variables for 'return getvalue();' -> x tmp = getvalue(); defers; return tmp;

lerno commented 5 years ago

When macros are implemented, one could have self cleaning resources using defer:

macro temp_mem!(usize size) // use ! to indicate scopeless macro
{
   void *mem = malloc(size)
   defer free(mem);
   return mem;
}

Usage:

{
    char *c = temp_mem!(10);
    ... do stuff ...
    // Look no explicit free!
}

paulie-g commented 5 years ago

The killer feature for defer + macros + generics, imo, is implementing fast and predictable automatic memory management including for collections and iteration thereof.

There is no reason you can't have easy collections, for-each and friends in a language that targets what C2 targets, provided that the internals of the implementation are predictable, swappable and don't infect scopes where they're not used. Consider the very common case of a long-running server process - network daemon, dbms, window manager, etc - starting up, working with configs.. We don't care as much about memory usage or performance here, and win greatly in terms of productivity if we have Python-like expressiveness (albeit w strict typing). So we create a custom allocator, either with a big arena/epoch we can throw away before we go to next stage of execution, and/or lots of small per-scope ones where we do bump allocation, and use them by default for collections. This would be a) super-convenient, b) much faster than competing implementations, c) still predictable enough in terms of what it does - or swappable if you don't like how it does it, and d) wouldn't infect the rest of the codebase.

lerno commented 5 years ago

Raising a hand This is what I'm aiming for in (C3, my "successor" ;) to C2). Starting with managed pointers https://c3lang.github.io/c3docs/ideas/#managed-pointers which is not completely fleshed out, and context stacks (so you can push a custom memory allocator and later pop it, to use - for example - use an arena allocator on the the underlying library functions)

I proposed this for C2, but I think this might be going too far out for C2 if I understand correctly.

bvdberg commented 5 years ago

I think a mechanism to help avoid resource leaks would be very nice. On the other hand I want C2 to be like C in that it should be very easy to predict what the code does. So no 'underwater/background' functions like garbage collection etc. In C++, there is some non-explicit functionality with destructions (like someone joked in C++ you close sockets and free memory with the '}' character). But in C++ that never bothered me, since it's quite easy to see the behaviour (just look at the destructors of all objects in scope). So maybe defer in combination with macros could also fall in this category.

I think adding managed-pointers will be a step too far. Rust used to have 7 types of pointers (!). Luckily the developers managed to refactor this to the current 3 (?). I never like this at all.

A second use case (the first one being resource management) would indeed be iterators. So ideally you don't need to know the details of a data structure to use it correctly. C++ does this quite well I think. Maybe we can start with just write the C2 code we want and then implement it in the language? So maybe an example of a linked list and array, where the 'user' just uses an iterator and can switch between them with minimal code changes..

paulie-g commented 5 years ago

It's obviously your language and your call, but I think simple predictability, while the main and most important requirement, is not enough. I want to be able to implement or change this underwater machinery. I don't mind if the language does or does not provide a default implementation, but I do want the ability to write the implementation once and reuse it with syntactic sugar.

I like the idea of defer a lot and macros are obviously essential. I don't see managed pointers as overly complex, hand-wavy or magic. If we have macros, everything except the syntax sugar for them can be trivially implemented. In fact, why not extend defer? Unless I'm missing something, the ability to invoke a callable on a specified event (end of scope, end of function) is isomorphic with managed pointers.

I think a python-style iterator and collections interfaces would work well for pluggable implementations, at least as an indicator of what is required. With the obvious difference that its use wouldn't be an actual call or a conditional branch. With that said, I think you absolutely need to have custom allocators and generics sorted first and, preferably, a tuple-like type as well.

As for not knowing the details of an ADT implementation, I actually disagree. If you're the sort of person solving the sort of problems that don't require you to fully understand the implementation and performance implications of said implementation of the data structures you're using, there are already mature languages for you that give you this layer for free and tell you ignorance is bliss. These days, behaviour wrt cache and pipeline dominates so you can safely throw Knuth out the window, therefore you either don't care because your problem is such that you can afford not to, or you should absolutely know.

lerno commented 5 years ago

So can I just explain what my idea was in C3?

Basically introduce a new pointer type with the @ suffix. So foo@ is the managed pointer and foo* is the unmanaged pointer.

What happens is that an implicit Foo.release(x) is inserted as a defer directly after the use. Conversely assigning @foo to a non local slot creates a Foo.retain(x). This can then be used to build everything from RC to auto closing file handles.

An example, this code:

func Foo@ getManagedFoo() { ... }

Foo@ global_var = nil;

func void test() {
  Foo@ a = getManagedFoo();
  Foo@ b = a;
  Foo@ c = getManagedFoo();
  global_var = c;
}

Would be transformed into this:

func Foo@ getManagedFoo() { ... }

Foo* global_var = nil;

func void test() {
  Foo* _temp1 = getManagedFoo();
  defer Foo.release(_temp);
  Foo *a = _temp1;
  Foo* b = a; // No retain, same scope!
  Foo* _temp2 = getManagedFoo();
  defer Foo.release(_temp2);
  Foo* c = _temp2;
  global_var = Foo.retain(c);
}

paulie-g commented 5 years ago

I've read that page on your site (and a few others ;) and that's exactly why I said what I said about it being isomorphic - defer does all the heavy lifting here and all you're doing is adding the syntax sugar. Presumably, if defer is OK then this must be as well. It is also easily implementable via macros (provided the macro system lets them operate on the AST above as well as below the use site).

c2lang / c2compiler

Implement Defer #65