Closed lerno closed 1 year ago
Any updates?
The only alternative to do { } while (0)
is simple {}, since we need to protect the defer scope.
Consider:
defer { int a; } int a;
This does not work unless defer has its own scope. Originally I used simple { } to start a scope. That was much less readable than the do-while block. That’s why it’s everywhere. It was a deliberate decision.
Why would anyone create variables in a defer? I think we can safely forbid that. Just allow function calls, some assignments and if statements and so on.
Defer at function end would require closures. Also, it would be completely different to implement. If you want this feature. Call it ”func defer” or something, implement it later. Its use is completely orthogonal to normal defer.
@bvdberg of course one would use variables in defer.
You’re marking obvious requirements ”useless”. I can’t shake the feeling that you’re thinking of some completely different feature.
I will rebase this one, then add more tests, but the feature I NEED is defer that works at the end of scope. At function end is NOT USEFUL and outright BAD. Also, there is no way to "tweak" this current feature to become an "end of function" style defer. Such a functionality is completely and utterly different in implementation from "end of scope"-style defers.
It is possible to write a "func defer" as well. But that one CANNOT use the very simple and straightforward code that scope defers use. Implement it yourself and you'll see that func-defer is a very high level feature, akin to implementing deferred automatic refcounting.
Again: the usecases for such a defer is different from scope-defer.
I'd like to add that in most languages, it is indeed expected from defers to run at scope end rather than function end. The primary reason I can think of is that limiting defer to function kinda breaks defers in ifs, switches and nested blocks and definitely breaks defers in loops.
Variables in defers are useful if cleanup requires temporary variables, for example if you had to free a linked-list like structure iteratively.
One thing not handled:
switch (foo) {
case 1:
defer close(bar);
case 2:
printf("b");
}
Presumably defer should trigger on leaving the switch if foo == 1. Right?
I'm adding test cases for switch and also make sure that it generates the correct C code.
Yea, the scope here is the one particular case, so I would guess that yes. We wouldn't want defer running in different cases. Not to mention that the defer might use a variable declared in the case, which is entirely possible, although cases can't start with a declaration.
From what I've read now, switch should be treated as a goto with labels, so we can rewrite it (in terms of analysis) like this:
{
if (foo == 1) goto label1:
if (foo == 2) goto label2:
goto exit:
label1:
defer close(bar);
label2:
printf("2");
}
exit:
Since this analysis already exists for goto, it's just a matter of changing the parsing and analysis a little bit.
Defer is a nice add-on for C2 and offers developers an extra way of cleaning up stuff. It is not a replacement for C++ destructor behavior. There are 2 options for running: function scope and any scope end. The main criteria is that any option will have to be implemented without a runtime. C2c can generate stuff at compile-time, but no list administration must be needed at run-time.
In my mind, defer is a way to fire-and-forget. So claim a resource and defer the free-ing of it. Mostly the claim can also fail, so should be checked first.
i32 fd = open("file");
if (fd == -1) {
// handle
return;
}
defer close(fd);
// .. do stuff with file
// many lines possible here, unrelated of file
So in this case it's easy to forget to close, since there can be many lines between the open and the close.
The other example is any scope, for example in a loop:
while (x) {
i32 fd = open("file");
if (fd == -1) {
continue; // as an example
}
// read file
close(fd);
}
If the loop is small (and good code should have loop bodies that easilyt fit half a screen), using defer is not really needed I think. So the criteria for using defer would be when there are multiple return paths possible (and you have to call close in each one). Otherwise, just using close is better, because it's more readable.
So that's why I prefer function scope. A second thing to remember is that using defer prevents the use of tail-call optimization.
Function scope defer cannot be implemented without a runtime.
Also, tail call optimization is only prevented for function-defer.
So function-defer:
You end by ”that’s why I prefer function scope”. But I don’t ser what the actual argument was. – That it will prevent people from using defer in some cases where you think it should not be used?
The reason we compare it to RAII is that just like with destructors, the execution of the defer + generated code will be will be easily understood.
Note also that scope-defer can be written cleanly in C using C macros in GCC/Clang using ”cleanup” and nested function extensions (using no allocations).
Function-defer can’t.
You can have function style scope withou runtime by excluding defers from jumps, while, do, for. At which point there is honestly very little left. (And you still break tail call optimization)
Whereas scope-defer can retain tail-call optimization when needed AND works with jumps, while, do, for.
Try to transform some longer non-trivial code using defer, then convert that to function style defer and scope-style respectively.
In particular, make sure that there are multiple defers and that their execution requires variables that do not exist at function top scope. E.g.
if (foo) { FILE f = open(x); defer close(f); ... many lines of code with early exit ... }
I'm looking at the defer_example.c2 I sent earlier. Using any scope, this means you cant do:
if (x) defer close(x);
Since the defer would be called immediately, right? (we could generate a warning for this). But in this case:
i32 fd = open("file");
if (fd == -1) return;
defer close(file);
The defer would effectively be function scope, even if we implemented any-scope.
Okay, let's go ahead with the any-scope then. I think we still need to some sane restrictions to avoid a very complex implementation.
When I implemented the case-body this morning, I found that storing some things in the Scope is very handy (and almost no cost). That way you can just forget about the actual scope stmt (like case/if) and just store and continue. The analyser code then looks like: ( for case, pseudo-code)
analyseCase(CaseStmt* C) {
enterScope();
foreach (stmt) analyseStmt(s);
if (scope.hasDecls()) C->setHasDecls(); // pick info up from Scope.
leaveScope();
}
This approach only works during analysis, but it could maybe set something in outer bodies/Compound stmts.
Update: One other thing I thought of. There are many scopes, some not so obvious to developers. For example, a Switch stmt has a scope as does every Case stmt. If we do want 'sane' behavior for the example below, we could add a 'DeferScope' that not all scopes have. If that makes any sense...
switch (x) {
case 0:
defer close(x); // if makes no sense to also run defer here
break;
case 1:
// ..
break;
}
if (x) defer close(x)
can be written as defer if (x) close(x)
In terms of scope, I think it's unwise to have specialized scope rules for defer. It's just one more pitfall for people to fall into. I think defers in switches will be rare, but have their use in this case:
switch (x) {
case 0:
FILE* f = open(a);
defer close(f);
if (file_starts_with_foo(f)) break;
if (file_starts_with_bar(f)) return 0;
b += file_number_foobar(f);
break;
}
Here there are multiple exit points and we want to close f at each one of them.
yes better to keep it all the same
I was thinking about the implementation. The current one seems to produce correct results, I couldn't find any issues (I did not test in combination with goto, as I don't think that has to work).
See the code below:
{
i32 a = open('a');
if (a == 0) return -1;
defer close(a);
i32 b = open('b');
if (b == 0) return -1; // defer: close a
defer close(b);
{
i32 c = open('c');
if (c == 0) return -1; // defer: close b, a
defer close(c);
// defer: close c
}
{
i32 d = open('d');
if (d == 0) return -1; // defer: close b, a
defer close(d);
return 0; // defer: close d, b, a
}
// defer: close b, a
return 0;
}
First: If you look at the order of defers needed to be run, they form a directed, acyclic graph. So D -> B -> A or C -> B -> A. So I thought of letting defers point to each other (D-> B, etc).
Second: I could only think of 4 points in code, where defers need to be run (ie end of scope):
The graph and the linking to the 'DeferPoints' can all be done at analysis-time, so code generation should be relatively simple.
The advantage of this approach is that you don't need an array of Defers and you don't need the conditionalDefers list.
The condition for this to work is to specify that goto's don't mix with defers. I'm personally in favor of this, since goto should just do a jump and nothing else. The analyser could give errors/warnings for this maybe.
@bvdberg I also think my solution can be improved upon, but I ran smack into the problem with switches where you decided on opening a new scope for every case. That also means that defer must run when going from one case to another. However, in C cases are the same as labels, and labels DO NOT create a new scope. So I just want to be sure that this is what you want.
In the new solution I'll find a more elegant way of handling defers BUT I think the problem with goto is a general problem that needs to be solved anyway (especially if case does not create a scope) and it will actually serve to make the problem and analysis BETTER for future changes.
I need to free up a few hours to work on it and it will look different, so we could actually close this one and I'll open a new one later.
@lerno I need to use nasty looking braces in case statements so often in C, I really want them out. I have (almost) never used the C feature of common scope in cases, except some very weird corner cases (see Duff's Device)..
What I'll do is add a few more unit tests to finalize the wanted behavior. After we agree on those, we can continue with the code, ok?
I feel a little bad for us breaking Duff's Device 😄
BTW I can do the change in parsing of defer anyway. Right now it assumes we can actually make the part AFTER the defer a subscope to defer. This is not really necessary (anymore). It was required in the first draft only. So I'll rewrite that and it will make the implementation cleaner. I'll also see if I can use the normal scope instead of having a parallell one. It should be possible when I don't do the weird parsing of defer.
How about this 'specification':
Design:
- dont allow goto/return/continue/labels/defer(😄) in defer
- only allow break if defer has body, otherwise not
- goto/longjmp don't call defers. (give error on both if jumping from scope with defers)
- run defer at any scope end (also case)
- Q: dont allow Decls inside defer? (not needed for cleanup?)
Impl:
- if (x) defer close(x) -> error, superfluous defer, probably unwanted
I want defer to work as seamless as possible. The only thing we rule out in defer is non-local jumps in defer. That means that return/continue is out (as they indicate non-local jumps), goto / label is ok as long as they are local. (Continue will also obviously work if in a subscope where continue is allowed, like a for-loop)
Longjmp will not call defer. In all other situations it is called. I don't think we need to do any errors on
if (x) defer close(x)
should be an error yes. But the rule is more generic: We give an error on any scope that ONLY contains a defer AND a warning when defer is the last statement in a scope.
Some examples
if (x) { defer close(foo); } // Error
{
defer close(foo); // Error
}
{
LOCK *lock = acquire_lock();
defer close(lock); // warning
}
switch (foo) {
case 1:
LOCK *lock = acquire_lock();
defer close(lock); // warning
case 2:
defer close(file); // error
case 3:
...
}
for (int i = 0; i < 10; i++) {
FILE* f = open_file(file[i]);
...
defer close(f); // warning
}
Sure, is the analyser code do-able for that (goto/label analysis)?
You're right about the last statement as well. Since the only statement is also the last one then, we only have to check if it's the last one..
goto/label analysis in defer is already done and works in this version already.
So a goto outside a defer will cause defers to be called? Like:
func void test(x) {
defer close(a);
{
defer close(b);
goto out; // will cause close(b) to be called, Not close(a)
// ..
}
out:
// ..
} // close(a) will be called here
Both b and a are executed yes (first b then a)
What about this case:
public func i32 main(i32 argc, const char*[] argv)
{
i32 a = open('a');
if (a == 0) return -1;
defer close(a);
{
i32 b = open('b');
if (b == 0) return -1; // defer: close a
more:
defer close(b);
if (b == 10) goto out; // defer: close b
if (b == 11) goto more; // What do to here? (does not change scope), Nasty
}
out:
return 0; // defer: close a
}
The line (b == 11) also generates a close(b). This will become very complex I fear..
I have some feedback about the patch itself, maybe you can integrate this..
All of what you describe in your example works.
In this case goto more;
it will trigger the defer since we're jumping to a place before the defer was created. Then defer will be again scheduled. On the other hand, if put goto AFTER the defer, then the defer would not be triggered.
label1:
defer close(b);
label2:
if (b == 10) goto out;
// jumping to label1 triggers the defer, then will schedule the defer again.
if (b == 11) goto label1;
if (b == 12) goto label2; // will not trigger the defer on goto.
}
The above becomes:
label1: ;
label2:
if (b == 10) {
close(b);
goto out;
}
if (b == 11) {
close(b);
goto label1;
}
if (b == 12) {
goto label2;
}
close(b);
}
A DeferStmt just has to have a Stmt*
Yes, part of the change I'm planning on
Each Break/Continue/Return stmt can just have a pointer to a single DeferStmt
I will check how it changes when I update the code
I think conditional defers can be removed
No, it's needed for goto, and even if we would have skipped gotos, I think there are corner cases later on that might need it.
Some errors added in DiagnosticSema/ParseKind.inc appear to be missing
Yes, you told me that it was only necessary to add in the inc, but apparently it's added in more places. inc files is the enemy...
Parser: there's no need to push back a token.
Will be fixed since defer parsing will be redone
Only generate do..while(0) if needed.
I'll see if it's possible and doesn't create weird issues... consider this:
{
i32 i = foo();
defer decrease_counter_by(i);
{
i32 i = bar();
if (i == 0) return 0;
/** do stuff **/
}
}
If we forbid shadowing, then this will never happen. Otherwise we run into an issue that needs to be solved by one or more unique temp references.
I think you forbade shadowing, but I'm unsure.
Yes all shadowing is verboten.
I still see the other feedback missing (Parsing Pushback token removal etc)..
I'm still working on it @bvdberg, I haven't pushed it yet. I only have a few corner cases left and then the test cases.
@bvdberg It's finished. There are some things one can work on later on:
Changes:
defer a(); // deferTop = defer1
defer b() // deferTop = defer2 -> defer1
while (x) {
defer c(); // deferTop = defer3 -> defer2 -> defer1
defer d(); // deferTop = defer4 -> defer3 -> defer2 -> defer1
if (foo) {
// deferTop = defer4.
// exit point defer = defer2 =>
// release defer4 & defer3
break;
}
if (bar) return 0; // Release the entire defer chain: defer4, 2, 3, 1
defer e(); // deferTop = defer5 -> defer4 -> defer3 -> defer2 -> defer1
...
// Scope ends, with defer5. Exit point has defer2 so release defer5->defer3
}
In DeferList, "start" is deferTop, "end" is the exit point. An exit point of null means all are released.
What does this do? I've not seen that construct in C++ yet.. (in Stmt.h class definition)
DeferList deferList {};
and DeferStmt* deferTop {};
I would expect:
DeferList deferList;
and DeferStmt* deferTop;
A lot of code seems to be change unnecessary, like ASTVisitor.cpp/DepVisitor.cpp I think a lot if indent or something, but It's very hard to see. Please only make the change you need and make styling changes in a separate pull..
Unit tests seem to crash c2c on many tests
Also the other feedback is still missing, like the PushbackToken stuff in Parser.
Initializes the struct / pointer to zero
I thought I minimized the formatting changes. Must have been switch cases where I could not read the cases clearly.
The pushback is something I forgot to revert from the previous implementstion. I can definitely fix that. Any other changes I need to do?
Just check my other feedback earlier. Those are my concerns..
I see no crashes @bvdberg – do you have any test case for me?
@bvdberg Changed:
Rebased on latest.
Stmt.h still contains: DeferList deferList {};
, please rewrite to DeferList deferList;
(unless you have a good reason for using the weird syntax..)
Looking at the implementation, I still have a lot of doubts. I hope you can understand these. One of the biggest is the size change of many AST elements. For example a CaseStmt has grown from 24 to 40 bytes. This may not sound like much to you, but I've spent a lot of time getting it down to that size. This implementation pretty much destroys all that effort. Most case statements will not have a deferlist, so having it as full member (size 16 bytes) instead of just a pointer seems strange.
C2c seems to clash on the following (very complex) code ;)
module test;
import stdio local; // removing this line fixes segfault
public func i32 main() {
return 0;
}
`
Another issue I'm still struggling with is the interaction between goto's and defers. I find it very hard to predict what will happen here:
// in some function defer close(a); label1: defer close(b); if (..) goto label1;
Since no scope is left, i wouldn't any defers to be called, but currently the implementation is that it
analyses which defers are jumped over and those all released. So for developers something unexpected might happen, which is what we want to avoid.
To make sure developers can very easily predict the behavior by looking at the code, I propose
to disallow goto statement in any (sub)scope where defers are present. Also dont allow goto's inside defers. This allows developers to either use defers or to use goto's for implementing their control flow.
I think this will also simplify the implementation by removing conditionalDefers..
Includes all of defer. Also changes C code to always output { } even for simple if, while etc. This simplifies defer code generation. Tracking defer adds several parts to analysing the function and outputting the right thing. CGen does not need to involve itself in figuring out how to emit the right code. All of that is already done during the analysis.