c2lang / c2compiler

the c2 programming language
c2lang.org
Apache License 2.0
703 stars 49 forks source link

Defer statement #66

Closed lerno closed 1 year ago

lerno commented 6 years ago

Includes all of defer. Also changes C code to always output { } even for simple if, while etc. This simplifies defer code generation. Tracking defer adds several parts to analysing the function and outputting the right thing. CGen does not need to involve itself in figuring out how to emit the right code. All of that is already done during the analysis.

lerno commented 5 years ago

Any updates?

bvdberg commented 5 years ago
lerno commented 5 years ago

The only alternative to do { } while (0) is simple {}, since we need to protect the defer scope.

Consider:

defer { int a; } int a;

This does not work unless defer has its own scope. Originally I used simple { } to start a scope. That was much less readable than the do-while block. That’s why it’s everywhere. It was a deliberate decision.

bvdberg commented 5 years ago

Why would anyone create variables in a defer? I think we can safely forbid that. Just allow function calls, some assignments and if statements and so on.

lerno commented 5 years ago

Defer at function end would require closures. Also, it would be completely different to implement. If you want this feature. Call it ”func defer” or something, implement it later. Its use is completely orthogonal to normal defer.

lerno commented 5 years ago

@bvdberg of course one would use variables in defer.

lerno commented 5 years ago

You’re marking obvious requirements ”useless”. I can’t shake the feeling that you’re thinking of some completely different feature.

lerno commented 5 years ago

I will rebase this one, then add more tests, but the feature I NEED is defer that works at the end of scope. At function end is NOT USEFUL and outright BAD. Also, there is no way to "tweak" this current feature to become an "end of function" style defer. Such a functionality is completely and utterly different in implementation from "end of scope"-style defers.

It is possible to write a "func defer" as well. But that one CANNOT use the very simple and straightforward code that scope defers use. Implement it yourself and you'll see that func-defer is a very high level feature, akin to implementing deferred automatic refcounting.

Again: the usecases for such a defer is different from scope-defer.

luciusmagn commented 5 years ago

I'd like to add that in most languages, it is indeed expected from defers to run at scope end rather than function end. The primary reason I can think of is that limiting defer to function kinda breaks defers in ifs, switches and nested blocks and definitely breaks defers in loops.

Variables in defers are useful if cleanup requires temporary variables, for example if you had to free a linked-list like structure iteratively.

lerno commented 5 years ago

One thing not handled:

switch (foo) {
  case 1:
    defer close(bar);
  case 2:
    printf("b");
}

Presumably defer should trigger on leaving the switch if foo == 1. Right?

lerno commented 5 years ago

I'm adding test cases for switch and also make sure that it generates the correct C code.

luciusmagn commented 5 years ago

Yea, the scope here is the one particular case, so I would guess that yes. We wouldn't want defer running in different cases. Not to mention that the defer might use a variable declared in the case, which is entirely possible, although cases can't start with a declaration.

lerno commented 5 years ago

From what I've read now, switch should be treated as a goto with labels, so we can rewrite it (in terms of analysis) like this:

{
  if (foo == 1) goto label1:
  if (foo == 2) goto label2:
  goto exit:
label1:
  defer close(bar);
label2:
  printf("2");
}
exit:

Since this analysis already exists for goto, it's just a matter of changing the parsing and analysis a little bit.

bvdberg commented 5 years ago

Defer is a nice add-on for C2 and offers developers an extra way of cleaning up stuff. It is not a replacement for C++ destructor behavior. There are 2 options for running: function scope and any scope end. The main criteria is that any option will have to be implemented without a runtime. C2c can generate stuff at compile-time, but no list administration must be needed at run-time.

In my mind, defer is a way to fire-and-forget. So claim a resource and defer the free-ing of it. Mostly the claim can also fail, so should be checked first.

i32 fd = open("file");
if (fd == -1) {
  // handle
  return;
}
defer close(fd);
// .. do stuff with file
// many lines possible here, unrelated of file

So in this case it's easy to forget to close, since there can be many lines between the open and the close.

The other example is any scope, for example in a loop:

while (x) {
    i32 fd = open("file");
    if (fd == -1) {
         continue;  // as an example
     }
     // read file
     close(fd);
}

If the loop is small (and good code should have loop bodies that easilyt fit half a screen), using defer is not really needed I think. So the criteria for using defer would be when there are multiple return paths possible (and you have to call close in each one). Otherwise, just using close is better, because it's more readable.

So that's why I prefer function scope. A second thing to remember is that using defer prevents the use of tail-call optimization.

lerno commented 5 years ago

Function scope defer cannot be implemented without a runtime.

lerno commented 5 years ago

Also, tail call optimization is only prevented for function-defer.

So function-defer:

You end by ”that’s why I prefer function scope”. But I don’t ser what the actual argument was. – That it will prevent people from using defer in some cases where you think it should not be used?

The reason we compare it to RAII is that just like with destructors, the execution of the defer + generated code will be will be easily understood.

Note also that scope-defer can be written cleanly in C using C macros in GCC/Clang using ”cleanup” and nested function extensions (using no allocations).

Function-defer can’t.

lerno commented 5 years ago

You can have function style scope withou runtime by excluding defers from jumps, while, do, for. At which point there is honestly very little left. (And you still break tail call optimization)

Whereas scope-defer can retain tail-call optimization when needed AND works with jumps, while, do, for.

Try to transform some longer non-trivial code using defer, then convert that to function style defer and scope-style respectively.

In particular, make sure that there are multiple defers and that their execution requires variables that do not exist at function top scope. E.g.

if (foo) { FILE f = open(x); defer close(f); ... many lines of code with early exit ... }

bvdberg commented 5 years ago

I'm looking at the defer_example.c2 I sent earlier. Using any scope, this means you cant do:

if (x) defer close(x);

Since the defer would be called immediately, right? (we could generate a warning for this). But in this case:

i32 fd = open("file");
if (fd == -1) return;
defer close(file);

The defer would effectively be function scope, even if we implemented any-scope.

Okay, let's go ahead with the any-scope then. I think we still need to some sane restrictions to avoid a very complex implementation.

When I implemented the case-body this morning, I found that storing some things in the Scope is very handy (and almost no cost). That way you can just forget about the actual scope stmt (like case/if) and just store and continue. The analyser code then looks like: ( for case, pseudo-code)

analyseCase(CaseStmt* C) {
   enterScope();
   foreach (stmt) analyseStmt(s);
   if (scope.hasDecls()) C->setHasDecls(); // pick info up from Scope.
   leaveScope();
}

This approach only works during analysis, but it could maybe set something in outer bodies/Compound stmts.

Update: One other thing I thought of. There are many scopes, some not so obvious to developers. For example, a Switch stmt has a scope as does every Case stmt. If we do want 'sane' behavior for the example below, we could add a 'DeferScope' that not all scopes have. If that makes any sense...


switch (x) {
case 0:
   defer close(x);  // if makes no sense to also run defer here
   break;
case 1:
    // ..
    break;
}
lerno commented 5 years ago

if (x) defer close(x) can be written as defer if (x) close(x)

lerno commented 5 years ago

In terms of scope, I think it's unwise to have specialized scope rules for defer. It's just one more pitfall for people to fall into. I think defers in switches will be rare, but have their use in this case:

switch (x) {
  case 0:
    FILE* f = open(a);
    defer close(f);
    if (file_starts_with_foo(f)) break;
    if (file_starts_with_bar(f)) return 0;
    b += file_number_foobar(f);
    break;
}

Here there are multiple exit points and we want to close f at each one of them.

bvdberg commented 5 years ago

yes better to keep it all the same

bvdberg commented 5 years ago

I was thinking about the implementation. The current one seems to produce correct results, I couldn't find any issues (I did not test in combination with goto, as I don't think that has to work).

See the code below:

{
    i32 a = open('a');
    if (a == 0) return -1;
    defer close(a);

    i32 b = open('b');
    if (b == 0) return -1;  // defer: close a
    defer close(b);

    {
        i32 c = open('c');
        if (c == 0) return -1;  // defer: close b, a
        defer close(c);
        // defer: close c
    }

    {
        i32 d = open('d');
        if (d == 0) return -1;  // defer: close b, a
        defer close(d);
        return 0; // defer: close d, b, a
    }

    // defer: close b, a
    return 0;
}

First: If you look at the order of defers needed to be run, they form a directed, acyclic graph. So D -> B -> A or C -> B -> A. So I thought of letting defers point to each other (D-> B, etc).

Second: I could only think of 4 points in code, where defers need to be run (ie end of scope):

The graph and the linking to the 'DeferPoints' can all be done at analysis-time, so code generation should be relatively simple.

The advantage of this approach is that you don't need an array of Defers and you don't need the conditionalDefers list.

The condition for this to work is to specify that goto's don't mix with defers. I'm personally in favor of this, since goto should just do a jump and nothing else. The analyser could give errors/warnings for this maybe.

lerno commented 5 years ago

@bvdberg I also think my solution can be improved upon, but I ran smack into the problem with switches where you decided on opening a new scope for every case. That also means that defer must run when going from one case to another. However, in C cases are the same as labels, and labels DO NOT create a new scope. So I just want to be sure that this is what you want.

In the new solution I'll find a more elegant way of handling defers BUT I think the problem with goto is a general problem that needs to be solved anyway (especially if case does not create a scope) and it will actually serve to make the problem and analysis BETTER for future changes.

I need to free up a few hours to work on it and it will look different, so we could actually close this one and I'll open a new one later.

bvdberg commented 5 years ago

@lerno I need to use nasty looking braces in case statements so often in C, I really want them out. I have (almost) never used the C feature of common scope in cases, except some very weird corner cases (see Duff's Device)..

What I'll do is add a few more unit tests to finalize the wanted behavior. After we agree on those, we can continue with the code, ok?

lerno commented 5 years ago

I feel a little bad for us breaking Duff's Device 😄

lerno commented 5 years ago

BTW I can do the change in parsing of defer anyway. Right now it assumes we can actually make the part AFTER the defer a subscope to defer. This is not really necessary (anymore). It was required in the first draft only. So I'll rewrite that and it will make the implementation cleaner. I'll also see if I can use the normal scope instead of having a parallell one. It should be possible when I don't do the weird parsing of defer.

bvdberg commented 5 years ago

How about this 'specification':

Design:
    - dont allow goto/return/continue/labels/defer(😄) in defer
    - only allow break if defer has body, otherwise not
    - goto/longjmp don't call defers. (give error on both if jumping from scope with defers)
    - run defer at any scope end (also case)
    - Q: dont allow Decls inside defer? (not needed for cleanup?)
Impl:
    - if (x) defer close(x) -> error, superfluous defer, probably unwanted
lerno commented 5 years ago

I want defer to work as seamless as possible. The only thing we rule out in defer is non-local jumps in defer. That means that return/continue is out (as they indicate non-local jumps), goto / label is ok as long as they are local. (Continue will also obviously work if in a subscope where continue is allowed, like a for-loop)

Longjmp will not call defer. In all other situations it is called. I don't think we need to do any errors on

if (x) defer close(x) should be an error yes. But the rule is more generic: We give an error on any scope that ONLY contains a defer AND a warning when defer is the last statement in a scope.

Some examples


if (x) { defer close(foo); } // Error

{
  defer close(foo); // Error
}

{
  LOCK *lock = acquire_lock();
  defer close(lock); // warning
}

switch (foo) {
  case 1:
    LOCK *lock = acquire_lock();
    defer close(lock); // warning
  case 2:
    defer close(file); // error
  case 3:
    ...
}     

for (int i = 0; i < 10; i++) {
  FILE* f = open_file(file[i]);
  ...
  defer close(f); // warning
}
bvdberg commented 5 years ago

Sure, is the analyser code do-able for that (goto/label analysis)?

You're right about the last statement as well. Since the only statement is also the last one then, we only have to check if it's the last one..

lerno commented 5 years ago

goto/label analysis in defer is already done and works in this version already.

bvdberg commented 5 years ago

So a goto outside a defer will cause defers to be called? Like:

func void test(x) {
    defer close(a);
    {
      defer close(b);
      goto out; // will cause close(b) to be called, Not close(a)
      // ..
    }
out:
   // ..
}  // close(a) will be called here
lerno commented 5 years ago

Both b and a are executed yes (first b then a)

bvdberg commented 5 years ago

What about this case:

public func i32 main(i32 argc, const char*[] argv)
{
    i32 a = open('a');
    if (a == 0) return -1;
    defer close(a);

    {
        i32 b = open('b');
        if (b == 0) return -1;  // defer: close a
more:
        defer close(b);
        if (b == 10) goto out;  // defer: close b
        if (b == 11) goto more; // What do to here? (does not change scope), Nasty
    }
out:
    return 0; // defer: close a
}

The line (b == 11) also generates a close(b). This will become very complex I fear..

bvdberg commented 5 years ago

I have some feedback about the patch itself, maybe you can integrate this..

lerno commented 5 years ago

All of what you describe in your example works.

In this case goto more; it will trigger the defer since we're jumping to a place before the defer was created. Then defer will be again scheduled. On the other hand, if put goto AFTER the defer, then the defer would not be triggered.

label1:
        defer close(b);
label2:
        if (b == 10) goto out; 
        // jumping to label1 triggers the defer, then will schedule the defer again.
        if (b == 11) goto label1; 
        if (b == 12) goto label2; // will not trigger the defer on goto.
}

The above becomes:

label1: ;
label2:
  if (b == 10) {
    close(b);
    goto out;
  }
  if (b == 11) {
    close(b);
    goto label1;
  }
  if (b == 12) {
    goto label2;
  }
  close(b);
}
lerno commented 5 years ago

A DeferStmt just has to have a Stmt*

Yes, part of the change I'm planning on

Each Break/Continue/Return stmt can just have a pointer to a single DeferStmt

I will check how it changes when I update the code

I think conditional defers can be removed

No, it's needed for goto, and even if we would have skipped gotos, I think there are corner cases later on that might need it.

Some errors added in DiagnosticSema/ParseKind.inc appear to be missing

Yes, you told me that it was only necessary to add in the inc, but apparently it's added in more places. inc files is the enemy...

Parser: there's no need to push back a token.

Will be fixed since defer parsing will be redone

Only generate do..while(0) if needed.

I'll see if it's possible and doesn't create weird issues... consider this:

{
  i32 i = foo();
  defer decrease_counter_by(i);
  {
    i32 i = bar();
    if (i == 0) return 0;
    /** do stuff **/
  }
}

If we forbid shadowing, then this will never happen. Otherwise we run into an issue that needs to be solved by one or more unique temp references.

I think you forbade shadowing, but I'm unsure.

bvdberg commented 5 years ago

Yes all shadowing is verboten.

bvdberg commented 5 years ago

I still see the other feedback missing (Parsing Pushback token removal etc)..

lerno commented 5 years ago

I'm still working on it @bvdberg, I haven't pushed it yet. I only have a few corner cases left and then the test cases.

lerno commented 5 years ago

@bvdberg It's finished. There are some things one can work on later on:

lerno commented 5 years ago

Changes:

  1. The old implementation would use a shadow hierarchy to traverse, this one efficiently keeps most of it in the scope, and analysis can be done on the fly (except for gotos, which is why they are checked in a later step)
  2. (1) also removes need to track defers, but alas we need to track any conditional defers (defers that might be skipped) by booleans declared in the beginning, so we need to keep them around for the function.
  3. There was a need to attach a defer exit to the last statement. Since we're not always guaranteed to have a compound statement, we wrap everything in a DeferReleasedStmt, which is basically just a wrapper. This was done to avoid adding DeferList to all statements. A more clever way can be devised to handle this. By improving the warnings and checks, DeferReleasedStmt can be completely removed. So this is an improvement point, but it's working completely as is (even though it's ugly). Fixing this would just make the feature even bigger.
  4. do/while is removed unless needed.
  5. I also removed superfluous () around expressions
  6. As long as ExitScope is used, then defers are detected automatically. Same code works the same way for break, continue and return. This is safer than the previous solution where break/continue/return needed separate (and more complicated) analysis.
  7. The whole thing works by scope holding the most recent defer. This will hold a link back to the defer before etc.
  8. When reaching an exit (scope, return, break, continue) we look for the defer most recent defer at the exit point. The chain from the most recent defer until it reaches the defer of the exit point are the defers to release. Goto works the same way actually, but looks for the lowest common scope as exit point.
defer a(); // deferTop = defer1
defer b() // deferTop = defer2 -> defer1
while (x) {
  defer c(); // deferTop = defer3 -> defer2 -> defer1
  defer d(); // deferTop = defer4 -> defer3 -> defer2 -> defer1
  if (foo) {
    // deferTop = defer4. 
    // exit point defer = defer2 =>
    // release defer4 & defer3
    break; 
  }
  if (bar) return 0; // Release the entire defer chain: defer4, 2, 3, 1
  defer e(); // deferTop = defer5 -> defer4 -> defer3 -> defer2 -> defer1
  ...
  // Scope ends, with defer5. Exit point has defer2 so release defer5->defer3
}

In DeferList, "start" is deferTop, "end" is the exit point. An exit point of null means all are released.

bvdberg commented 5 years ago

What does this do? I've not seen that construct in C++ yet.. (in Stmt.h class definition)

DeferList deferList {}; and DeferStmt* deferTop {};

I would expect: DeferList deferList; and DeferStmt* deferTop;

bvdberg commented 5 years ago

A lot of code seems to be change unnecessary, like ASTVisitor.cpp/DepVisitor.cpp I think a lot if indent or something, but It's very hard to see. Please only make the change you need and make styling changes in a separate pull..

Unit tests seem to crash c2c on many tests

Also the other feedback is still missing, like the PushbackToken stuff in Parser.

lerno commented 5 years ago

Initializes the struct / pointer to zero

lerno commented 5 years ago

I thought I minimized the formatting changes. Must have been switch cases where I could not read the cases clearly.

The pushback is something I forgot to revert from the previous implementstion. I can definitely fix that. Any other changes I need to do?

bvdberg commented 5 years ago

Just check my other feedback earlier. Those are my concerns..

lerno commented 5 years ago

I see no crashes @bvdberg – do you have any test case for me?

lerno commented 5 years ago

@bvdberg Changed:

  1. restored formatting where there were whitespace changes (my diff tool can ignore those, so I rarely see formatting changes. Also note that clang styling uses 3 spaces, so is mismatch on the rest of the code).
  2. Removed the pushback token completely.
  3. I think I identified something that could cause memory crashes: conditionalDefers* likely not initialized unless defers were detected – if this is the case then you should only have seen crashes in non-defer code! Of course, crashes only occurred if the memory wasn't set to 0 already. So you'd have intermittent crashes.
  4. I don't know the proper way to add the diagnostics, so I would appreciate if you make a small patch afterwards that correctly defines them.
lerno commented 5 years ago

Rebased on latest.

bvdberg commented 5 years ago

Stmt.h still contains: DeferList deferList {};, please rewrite to DeferList deferList; (unless you have a good reason for using the weird syntax..)

Looking at the implementation, I still have a lot of doubts. I hope you can understand these. One of the biggest is the size change of many AST elements. For example a CaseStmt has grown from 24 to 40 bytes. This may not sound like much to you, but I've spent a lot of time getting it down to that size. This implementation pretty much destroys all that effort. Most case statements will not have a deferlist, so having it as full member (size 16 bytes) instead of just a pointer seems strange.

C2c seems to clash on the following (very complex) code ;)

module test;
import stdio local;  // removing this line fixes segfault

public func i32 main() {
    return 0;
}
`

Another issue I'm still struggling with is the interaction between goto's and defers. I find it very hard to predict what will happen here:

// in some function defer close(a); label1: defer close(b); if (..) goto label1;



Since no scope is left, i wouldn't any defers to be called, but currently the implementation is that it
analyses which defers are jumped over and those all released. So for developers something unexpected might happen, which is what we want to avoid.

To make sure developers can very easily predict the behavior by looking at the code, I propose
to disallow goto statement in any (sub)scope where defers are present. Also dont allow goto's inside defers. This allows developers to either use defers or to use goto's for implementing their control flow.
I think this will also simplify the implementation by removing conditionalDefers..