ThePhD / future_cxx

Work done today for the glory of tomorrow - or, C and C++ systems programming papers.
https://thephd.dev/portfolio/standard
50 stars 8 forks source link

n3199 - Improved `__attribute__((cleanup))` Through `defer` #67

Open ThePhD opened 1 year ago

ThePhD commented 1 year ago

Latest draft: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Improved%20__attribute__((cleanup))%20Through%20defer.html

ThePhD commented 10 months ago

It took a while, but the paper was fully written.

Current standard revision: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3199.htm

jiyuzh commented 8 months ago

Thank you for the proposal. I like it very much! I have one question regarding Section 3.8.3: It seems there is no specification over whether break or continue are allowed to be used to jump over a defer. Do you think this type of jump over should be allowed or disallowed?

Pros

Consider the following use case:

for (int i = 0; i < 10; i++) {
    void *ptr = malloc(8);

    if (!ptr)
        break;

    defer { free(ptr); }
}

It seems we should allow such jump over in this example. Also, break is semi-related to switch, allowing break to jump over seems making loops and switch consistent on the constraint.

Cons

Some common goto usages can also be expressed with/converted to loops plus break or continue, hence if goto is banned, so shall break and continue. The wording of the draft also seems to imply return is an exception. Also, if labeled loops are adopted into C standard, the jump over could be even more error prone.

Personal opinion

I implemented a C-based polyfill using __attribute__((cleanup(...))) and used it in several of my projects.

During my usage, I find goto and switch-jump-over are indeed confusing. However, based on my personal feeling, the break/continue use case could be a good thing to have.

#define _DEFER_MERGE(a,b) a##b
#define _DEFER_VARNAME(a) _DEFER_MERGE(____defer_scopevar_, a)
#define _DEFER_FUNCNAME(a) _DEFER_MERGE(____defer_scopefunc_, a)
#define _DEFER(n) \
    auto void _DEFER_FUNCNAME(n)(int *a); \
    __attribute__((cleanup(_DEFER_FUNCNAME(n)))) int _DEFER_VARNAME(n); \
    void _DEFER_FUNCNAME(n)(int *a)

#define defer _DEFER(__COUNTER__)

It closely resembles some behaviors defined by the draft (i.e., example 2, 4, 5 from Section 6.4), which can be tested here: https://godbolt.org/z/3vv5dMMqG

ThePhD commented 8 months ago

Thanks for your support!

Regarding jumps like goto, break, and continue, the latest version of the proposal has the following constraints in the proposed wording:

Constraints

Jumps by means of goto or switch into E shall not jump over a defer statement in E.

Jumps by means of goto or switch shall not jump into any defer statement.

Jumps by means of return, goto, break, or continue shall not exit S.

S here is the scope of the defer statement itself. So the only jumps that are banned are ones that jump out of a defer, not ones that happen to skip its execution. Your example:

for (int i = 0; i < 10; i++) {
    void *ptr = malloc(8);

    if (!ptr)
        break;

    defer { free(ptr); }
}

is perfectly fine and valid. if ptr is NULL/a null pointer, then this code will simply exit the loop and the defer will not be called because it has not been reached in that scope yet. We do not ban any use of goto or friends that is appropriate, only banning if it jumps over a defer statement. You can goto into some block/scope that contains a defer, so long as you don't jump over it:

int handle = get_handle();
goto meow;  // label after defer: using `goto` to jump forwards or backwards over is illegal
defer { release_handle(handle); }
meow0; ;
meow1: ;
printf("hiiii"); 
goto meow1; // okay

I think that covers the cases you were speaking of. Also, even if someone implements break and continue like a goto, that's not of the C Standard's concern. The examples and informative (not normative) explanatory text demonstrating it in terms of gotos is not a mandate for the implementation to follow: a C compiler (or interpreter, or transpiler, or any C implementation) does not have to care about such hints. They have to uphold the Constraints and Semantics sections, as given. That most flow control can be flattened/rewritten in terms of goto is not something the C standard has to care about, so long as the exact use of continue or break follow what is in the standard before your compiler changes it / reorders it / messes with it.

BatmanAoD commented 6 months ago

One small organizational issue: the last paragraph of 3.4 (all of which is duplicated in N3198) states that "for the macros we provide almost every single one will be defined and have the value of 1", which makes sense in the context of N3198 but not N3199. N3198 is linked later, in 3.4.2; if N3198 were mentioned and linked prior to the note about MSVC and SEH, then the note could allude to "the macros provided in N3198."

tylov commented 4 months ago

There are, as far as most reviewers can tell, no errors in the creation or deletion of the various kinds of resources (particularly, repeated memory allocations).

Most reviewers? Each of the returns inside the loop will leak memory on namelist in addition to all namelist[k]'s for k in [i, n). Great proposal, though!

fuhsnn commented 4 months ago

Greetings, if implementation experience from an amateur compiler could help get this through, I wrote a little bit here along with the implementation.

fuhsnn commented 4 months ago

Some feedback on C++ compatibility (hope this is relevant, since the draft does contain sizable words for that):

The lambda based scoped guard implementation under 4.4. The Polyfill/C++ Fix may not behave as expected (as my understanding of the proposed C defer) when copy-elision is in effect, for example EXAMPLE 4 converted to struct may return 5 instead of 4 on all major C++ compilers: https://godbolt.org/z/q41rMo5Ex

My short discussion with LLVM people: https://github.com/llvm/llvm-project/issues/100869

ThePhD commented 4 months ago

The lambda based scoped guard implementation under 4.4. The Polyfill/C++ Fix may not behave as expected (as my understanding of the proposed C defer) when copy-elision is in effect, for example EXAMPLE 4 converted to struct may return 5 instead of 4 on all major C++ compilers: https://godbolt.org/z/q41rMo5Ex

Nothing I can do about that, unfortunately. Maybe if C gets a form of applicable NRVO, but otherwise the intent and the effect is still matching and the same.

ThePhD commented 4 months ago

Greetings, if implementation experience from an amateur compiler could help get this through, I wrote a little bit here along with the implementation.

Thank you for the implementation. At the moment defer is likely headed to a Techincal Specification unless we can convince folks that it should just go straight into the next working draft instead. I might do that but I'm tired of fighting people on this stuff so I'm just writing a Technical Specification; I will link to and include your work in the next revision of the proposal, however.

fuhsnn commented 3 months ago

Thanks, just keep doing what you're doing. I'll keep stalking wg14 documents for fun stuff to implement.

strncmp commented 2 weeks ago

Hello, I just wanted to ask if any provisions to completely solve the resource leak and dangling resource problems in C were mentioned by the committee during the discussions of this feature. I know this feature is supposed to help with that, and I appreciate it very much, but it doesn't protect the user in a language level manner (you still need "better analysers and linters" and many reviewers to gain confidence about your resource use, and you still cannot be sure that you haven't missed anything). That being said, thank you for working on this, it's really useful.

ThePhD commented 2 weeks ago

C cannot fundamentally solve this problem without changing core parts of C.

Have you considered writing Rust?

strncmp commented 2 weeks ago

I'm currently learning Rust, but that is not what this is about. I was just curious whether the committee considers this a part of a broader strategy, or just a one-off feature. Nevertheless, thank you.

ThePhD commented 2 weeks ago

If you'd like to know the Committee Strategy, you can see the Charter that was updated for C2y, which added a much higher focus on application security: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3280.htm

Nevertheless, the C committee cannot cook up features. People, like me, have to write proposals. They can be coordinated, but generally they are done independently. Other people are trying their own things which they have shown interest in standardizing but have written no papers yet, such as: https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854 https://llvm.org/devmtg/2023-05/slides/TechnicalTalks-May11/01-Na-fbounds-safety.pdf https://www.youtube.com/watch?v=RK9bfrsMdAM https://clang.llvm.org/docs/BoundsSafety.html

ryansuchocki commented 1 week ago

Thanks for all the work that this must have taken. It's exciting to see that things like this are on the cards.

I'm curious about the rationale for proposing the defer {<cleanup code>} <protected code> structure rather than something like do {<protected code>} finally {<cleanup code>}.

The do {...} finally {...} structure would seem dramatically more intuitive to me as a piece of C syntax, but moreover I think it has some distinct advantages which would make it easier to teach and explain the semantics (especially when it comes to the interactions with return/break/continue/goto):

I'm sure this has had consideration, so I'd like to hear your perspective. I think I saw in the minutes of one of the meetings that there had been some debate on this aspect of the proposal (but there was no detail).

ThePhD commented 1 week ago

This is discussed during the "Implementation Experience" section: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Improved%20__attribute__((cleanup))%20Through%20defer.html#experience

Specifically:

This means one has to move all variable declarations out of the try, in order to reference them properly in the finally. All in all, this means that while such a movement/changing of how variables are created in that block scope, it encourages unsafe practices such as keeping large amounts of uninitialized data. This encourages setting variables to uninitialized / sentinel values, and then properly initializing them in the try block before deploying a finally. We would rather not pursue such an option in C.

tylov commented 6 days ago

There is another option, which is __try { ... __finally: ... }. This has the same benefits of appearing in the order in which it runs, but still within a limited scope. It is also intuitive because the code flows naturally into the __finally: stage unless a return already sent it there before returning. Was this considered?