Open ThePhD opened 3 years ago
This is now the _Alias
paper. Article coming soon, when the art is done.
https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Transparent%20Function%20Aliases.html
Still not on twitter, but saw your wail about how undef's might break this proposal.
It's likely I don't get all the nuances, but I was wondering: since _Alias
is a new thing (despite trying to be transparent!), couldn't you just mandate an error for undef'ing an _Alias
(basically an exception in 7.1.4p2)? Yes, it will make it impossible to use both old binaries and new libraries (employing _Alias
) at the same time, but that sounds like a fair trade-off for rolling your own imaxabs
or whatever.
The problem isn't the #undef
- that's just there to show how fucked things are. The real problem is the redeclaration: that turns into a hard error right now if you're using the _Alias
.
Yeah, that's what I meant by "rolling your own". I still think my point stands though - why not forbid redeclaration of an _Alias
- effectively canonicalizing that hard error? That would be a break of its transparency, but since it's new, its behaviour is up for definition. I get (I think!) this means that codebases happily redefining stuff could not be compiled against (e.g.) a newer glibc anymore - if they want, they'll have to rename their redeclared function.
I'm sure that would receive some pushback from those who redeclared stdlib-stuff and never want to touch their pustulant codebases yet still get to upgrade their dependencies indefinitely. But there comes a point where someone needs to pay for the crimes of the past (it's also not like renaming the redeclared function would be that hard); and the good thing is that it'd be a smaller group than those that "just" don't want to touch their code (but haven't redeclared critical functions).
Tabled for post-C23.
N2901.
Doesn't a static function pointer already let us do this?
Sorry if this is answered somewhere, I just came here from your blog post and haven't had the time to read the proposal itself. Maybe this can serve as a nice FAQ entry for others with the same question though.
Seems to me that even as far back as C89, we could portably indirect between ABI function name and C function name like this:
int f(int x)
{
return x - 1;
}
static int (* g)(int x) = f;
int h(int x)
{
return g(x);
}
When I compile this with a recent clang
with optimizations turned up, the g
entirely disappears from the resulting object file, as if I simply didn't have it in the source and called f
directly.
This seems to achieve the same goal as the proposed _Alias g = f;
. You do need to explicitly repeat the function type signature, but that seems fine for something that goes into a header.
Of course, static function pointer becoming zero cost requires the compiler to implement a modest optimization - but _Alias
requires implementation effort too, while that optimization is generally useful to implement and already exists in the wild.
I do realize that think "freedom for a bad implementation to needlessly store one pointer and indirect through it" is QoI wiggle room, which your blog post was strongly against, but it's just a tiny bit of performance wiggle room, not code behavior and ABI wiggle room, right?
On the other hand, I can see how having an explicit new syntax would help educate developers on this pattern, and might be useful for tooling?
Any other advantages I missed? (An "already answered in N2901" reply is fine, I'm happy to take the time to find it once I know it's in there.)
There's one true problem with (static) function pointers, and it's that indirection does not work as intended.
int f(int x)
{
return x - 1;
}
static int (* g)(int x) = f;
int h(int x)
{
return
f(x) // ok
+ (&f)(x)// ok
+ g(x) // ok
+ (&g)(x)// breaks
;
}
This matters mostly for macro expansion and the like. That slight syntactic difference makes it painful enough to warrant a better front-end feature. The next iterations of the paper are also going to apply the concept to more than just functions: we are going to allow variables to be _Alias
'd as well, which is a prominent secondary feature.
Ah. As an abuser of macros I do appreciate that edge case. Thank you!
Wait, static no-op wrapper functions solve the indirection problem:
int f(int x)
{
return x - 1;
}
static int g(int x)
{
return f(x);
}
int h(int x)
{
return
g(x) // ok
+ (&g)(x) // ok!
;
}
Also would get optimized out to nothing on modern compilers in my experience.
Yes, but that defeats the property that f == g
/ &f == &g
; they must be different functions according to the abstract machine and so even if the optimizer can fold them, code paths which might do things like not store duplicate function pointers (by comparing them) will be forced to store what amounts to the same function twice.
Oh, right, my bad, I should've noticed/remembered that as a goal. One more please: what's the downside with using a (not function-like) #define
to achieve this?
int f(int x)
{
return x - 1;
}
#define g f
int h(int x)
{
return g(x) + (&g)(x) + (g)(x);
}
Off the top of my head I only see things that could be surmountable (tooling having difficulty with symbols that don't exist after the preprocessor is done with the code; developer aversion to using macros like that) or could be reasonably answered with "well don't do that!" (vulnerability to #undef
and redefinition)?
Ohhh I found it in the proposal:
This also includes preventing their existence as a whole with
#undef imaxabs
: every call after that#undef
directive toimaxabs(...)
must work and compile according to the Standard.
Okay yeah I see it now. If we were writing a third party library this wouldn't matter because we could say "don't #undef
it!", but a conforming implementation of the standard C library isn't allowed to say that.
Why _Alias g = f
and not _Alias g f
? The latter is more consistent with typedef foo bar
, which is the most similar construct in C?
To me as a developer that seems like an extra arbitrary difference which I'd have to just memorize, but I suspect you've got great rationale for that equal sign being in the syntax as well?
It's talked about in the proposal. I wrote it that way because it makes sense to me, but honestly it can be spelled any-which-way; I have 0 preference.
Yeah I saw section 3.5 about syntax it just didn't seem to have anything about the equal sign vs no equal sign.
Thank you for replies by the way! I really appreciate it! (Also, looking at the rest your blog and proposals, I really appreciate the careful and caring thought you put into improving C for all of us!)
Actually, I want to challenge &
sameness being a strictly good thing, re: how function pointers behave versus the proposed aliases:
This matters mostly for macro expansion and the like. That slight syntactic difference makes it painful enough to warrant a better front-end feature.
If a macro wants an address of something which already decays to a pointer, then it is usually worse for clarity, readability, reasoning about what the code will do, and composability, if the macro internally slaps a &
on a parameter. If a macro just requires you to pass in a pointer, then:
&
, and users of the macro then have the choice to just write the name as is, or slap a &
on it if they feel that's clearer,&
in front of them, it's clearer and more self-descriptive that MACRO(&foo)
is taking an address than MACRO(foo)
,*&foo
- andMACRO(lookup_the_right_foo(...))
or refactor MACRO(&foo)
into function_that_calls_macro(&foo)
inside of which the type of foo
is explicitly a pointer by the time it gets to MACRO(foo)
.So I think it's worth adding into the consideration that maybe the language-level advantage of aliases over static function pointers of &
remaining a no-op does more to enable code that is harder to read/check/maintain than it does to help well-written code.
But maybe I'm missing good reasons to sometimes write macros that do &
on their parameters internally? [Edit: I have come up with a kind of macro that, in combination with a few other tricks, provides really nice ergonomics and code-correctness benefits, which must use &
on its arguments internally, or else it might be too annoying for most people to use.]
P.S. Of course that's still not necessarily sufficient reason against this proposal, in fact you could use all this as a strong argument for alias - even if I'm right, developers would benefit from _Alias
precisely because it is so much more simple, obvious, and doesn't require so much edge-case deliberation.
I gave more thought to _Alias foo = bar
versus _Alias foo bar
(and it led to me noticing another difference from static function pointers):
I think it should be determined by how much, if at all, developers need to keep in mind the difference between aliases and regular variables.
If developers can think of _Alias foo = bar
as semantically kinda like a variable assignment where the only difference is that it's resolved to the right aliased thing at compile time, and this doesn't cause any subtly wrong code to silently compile, then =
is the right syntax, because it matches syntax and expectations to semantics and usage better.
If there are situations where a developer might write the wrong code due to having the misconception that _Alias foo = bar
is just a plain regular variable assignment, or due to slipping into some intuition as if it's almost just a plain regular variable assignment, then changing the proposed syntax to _Alias foo bar
would be a great reminder to developers that they're dealing with an unusual construct that behaves differently.
To illustrate the difference:
You alias a function name from another library which is dynamically linked at runtime. You call that function once through an alias. Then other code in the same program unlinks/unloads one copy of that library, and links/loads another copy. You call through the function alias again. What happens? Probably undefined behavior, but:
As I understand it these aliases have to compile like the latter, right?
If you generalize aliases to global variables, which was mentioned elsewhere as a possible future direction, this is even more apparent: normally, after foo bar = qux;
-shaped statements in C, bar
has the value of qux
, even if qux
is then modified - but if bar
is an alias of a global mutable variable, then it has whatever value qux
has at that moment, right?
And in fact, if I didn't know about this proposal and I saw it in code, if there's an equal sign, my first idea would be that _Alias
is some kind of standardized type.
So I think dropping the equal sign would also help clue in developers that it is not a normal assignment, not a normal variable, and doesn't behave quite like one.
I've got not problem dropping the equals sign and using the typedef-like syntax! The typedef-like syntax was mostly because typedefs are part of declarators, so evolved out of that since typedef
is, grammatically, like a storage class of sorts. I don't want to class it as a declarator, but using a familiar syntax will still be good.
An outsider's first impressions, for what it's worth:
#define myfunc …
works for everything outside of libc, as there's no §7.1.4 to fight with there. So this proposal is purely meant for libc. But internal libc details shouldn't really need standardization, right?
#define imaxabs …
is dismissed due to §7.1.4. But when this proposal still conflicts with another part of that same section, it's OK because that only affects experts. Are there examples of non-experts relying on #undef
to not change what imaxabs points to? Because I certainly don't rely on that, and seems reasonable to have the same consequences as omitting the #include
entirely.
Anyway all of that may be less important than the basic question: Is it OK to break the ABI if migration is possible? There'd be no trouble with libc, but I can still imagine situations where an old third-party DLL wreaks havoc with a new EXE with a different idea of what intmax_t
means. To me that's the more interesting part of this proposal. Where is the line exactly–what will Microsoft let us get away with? 😄 Will this proposal actually let us change intmax_t, or is it limited to more specific libc tweaks?
@markdascher
So this proposal is purely meant for libc.
I don't think that's exactly right. Other libraries don't strictly need to avoid #define
indirection for their own identifiers the way that libc does, but:
If that was good enough to force libc into it (so that developers could rely on this property as an invariant) then we should probably assume it's at least sometimes good to empower other libraries to do the same thing. For example, I've only ever relied on the ability to #define
over some identifier in development/debugging, never in production code, but it's nice to have the ability on standby (sometimes the most efficient way to get insight into a hard-to-reproduce bug while juggling other priorities is to #define
over libfoo_whatever
so that every call to it expands to also log the arguments and then log the return value, and deploy that build into a test environment for a few days).
There are also arguably benefits for clarity, static analysis, and so on. Especially since the preprocessor is basically a separate language that doesn't talk to the compiler/assembler/linker, and it's all-or-nothing - we either preprocess the file (and thus lose the aliasing information which this proposal would preserve), or we don't preprocess it at all (and thus possibly miss/hide other things that static analysis might be really interested in). If, instead, it's a construct that the compiler knows about, it could even be implemented in terms of native features in ways that are more useful than #define
(for example, does the ELF format have built-in support for aliasing two or more symbols to the same thing? I'd bet it does, and that would make aliases visible all the way through from source to inspecting non-stripped binary builds).
I think at the human level it can help adoption of good practices when there are explicit features for them. I think if we give people this _Alias
feature, more people are going to actually use it than a #define
which would achieve the same purpose, more people are going to talk about how it's useful or when we should use it, it'll be way more searchable (not just in source code and docs but also in tutorials/Q&As online, etc).
But internal libc details shouldn't really need standardization, right?
I think this is less of an internal libc detail and more of an interface with the compiler which - only incidentally - a libc might need if it wants to decouple ABI compatibility and source-level compatibility.
I also would like to see C have a bias towards minimizing how much functionality is only available to libc implementations which have the manpower and willingness to be coupled to non-standard compiler extensions. (We kinda had this problem in C99 with tgmath.h, before C11 standardized _Generic
.)
I also think basically all platforms and compilers currently have a way to alias identifiers like this, and (for example) glibc already uses that. So this is in many ways just taking a feature that implementations have already converged on anyway, and giving it a standard name/interface.
I can still imagine situations where an old third-party DLL wreaks havoc with a new EXE with a different idea of what
intmax_t
means. [...] Will this proposal actually let us changeintmax_t
, or is it limited to more specific libc tweaks?
Hmm.. maybe we can talk through an example? I imagine it goes like this:
Old DLL was compiled when intmax was 64bit. Let's say it provides a function void f(intmax_t)
and expects to call a function void g(intmax_t)
.
The system decides to offer a new definition of intmax_t
. In the relevant header, it changes the definition of intmax_t
and defines the old size integer as __intmax_old64_t
.
The library providing g
decides to upgrade to the larger 128bit intmax_t
. It changes the definition of g
to void g(__intmax_old64_t)
. It adds _Alias g_64bit = g
so that new source which needs to target the old ABI can still refer to the old g
with an explicitly-old new name. It defines a new void g_128bit(intmax_t)
(which is basically a copy of the old g
definition, still using intmax_t
), and aliases the g
name to it with _Alias g = g_128bit
so that newly compiled code targeting the void g(intmax_t)
API get the wider type. When this library is built, in the binary g
still refers to the function with 64-bit argument, and g_128bit
refers to the new one.
(Aside: a point in favor of making it _Alias foo = bar
with the equal sign - it's really much more obvious/intuitive/effortless/unambiguous for my brain to remember/know the order when an =
is involved! So much so that I almost want to propose an alternative typedef syntax which uses an =
for consistency.... without the =
it's kinda arbitrary, with an =
it's bound to be consistent with all the other =
use.)
g
and also links against the old DLL. The old DLL refers to g
in the library, which is still there, and the DLL's f
hasn't changed either, so if you don't call either f
or g
, everything is happy. If you only call g
, you now get the new wider intmax_t
definition, which links against the new g_128bit
, and it all works. If you call f
, either you or the provider of the DLL's header which defines f
must update the header/signature of f
to use the __intmax_old64_t
definition. But, if you have that, you're good.Notably, this is still a bad/messy/complected problem, but it doesn't get worse with dependencies-of-dependencies, so that's nice. By enabling old ABI-compatible identifiers to remain at their old names at all but source level, we turn the problem from an ABI break problem (which does ripple down through the dependencies as each dependency is rebuilt) into just a header-updating problem, and you no longer need to worry about anything that your code doesn't call directly.
That's still not always realistically tractable, and I don't know if it will solve the intmax_t
problem in particular (that type's goal/meaning/name of being the widest is uniquely problematic in a language whose type details leak directly into commonly-relied-upon public ABI boundaries) but it is better... significantly better in my opinion.
Anything a libc can do, a normal user should be able to do. The less special magic soup implementations have to do things in C code and the more that is normally accessible to the plain user, the better.
Furthermore, this is not just a libc problem. Literally every single implementation -- including proprietary ones like IBM -- that is represented in the Committee has one of these mechanisms. We are standardizing an existing practice to a long standing issue. We also know that this is a persistent issue because every implementation votes out improvements to libc on ABI grounds, despite every implementation currently attending the C and C++ Committee having a mechanism for this. Going by the letter of the standard, despite knowing every single implementation has had techniques to keep this from being a problem from anywhere from the last 20 to 40 years, is helpful.
Finally, normal users deserve the ability to write better, backwards compatible libraries too. They also deserve the ability to implement new interfaces without stepping all over their existing users, if that's something they would like to support.
Current version of the paper here:
I added the latest version of the paper here, with revamped motivation and everything else: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Transparent%20Aliases.html
(Will be on the WG14 Document log as N3329 sometime soonish.)
The first paragraph of 4.7 (Variables) at https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Transparent%20Aliases.html#design-variables currently ends abruptly:
[..]syntaxes all support using this to rename variables. Therefore, alias declarations
There has been expressed want for transparent aliases for non-functions. [..]
A few paragraphs later has a double word:
Furthermore, it can allow for specific constructs constructs to be renamed [..]
The code sample has a typo "deprecatad".
EDIT - Latest Draft: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Transparent%20Aliases.html
For C. The goal is to defend against ABI by allowing function renaming. Attributes on the renaming declarations can also provide an easy-out for weak symbols.