[tool conventions] Should a function pointer be 64-bits?

binji commented 4 years ago

The tools will likely consider this feature to be "wasm64", but it doesn't yet have 64-bit function table indexes. Since C/C++ function pointers are lowered to function table entries, this means that they are 32-bit pointers. Should they be? Or should the tools use 64-bit "pointers" where the top 32-bits are empty?

cc @sunfishcode @jgravelle-google

aardappel commented 3 years ago

The current LLVM implementation uses 64-bit function pointers (because having different size pointers for different things, while possible in LLVM, is going to trip up all sorts of C/C++ code that assumes all pointers are the same size). These values then get truncated just before a call_indirect. See WebAssemblyISelLowering.cpp in https://reviews.llvm.org/D83729 for details.

It would of course be nice not to have that truncate op, but personally I would estimate call_indirect to be less performance critical than if and the other operations mentioned in https://github.com/WebAssembly/memory64/issues/10, so if we're going to postpone those, there's no need to address this one either.

In an ideal world, it would be nice for call_indirect to accept i64 operands, even if the underlying table is never allowed to be >4GB. I certainly hope that "4GB of unique function pointers should be enough for everyone" ;)

sunfishcode commented 3 years ago

I don't have a strong opinion about this; big picture, 32-bit function pointers would be more work for small gains.

This is a different situation from if and other operations mentioned in #10 though -- those are just local optimizations which could be added at any time. The size of function pointers is part of the C ABI, and not just the .o file ABI, but the C struct layout ABI, so changing it later would be a major breaking change. If we're going to do this, we should really think about doing it now.

For completeness, here's what a case for making function pointers on wasm64 32-bit might look like:

It would make in-memory vtables and similar data structures more compact, and would follow the spirit of C of having the builtin types reflect what's natural for the target machine. Wasm's "function pointers" really are in a different "address space".

And it appears LLVM does have support for pointers of different sizes; see the p specifier in LLVM's data layout. Address space 0 would remain for data pointers, and a new address number could be defined for function pointers. Perhaps 256, to correspond to how x86 starts its architecture-specific address spaces at 256. C/C++ for their part allow this, and it could easily be made to work with code that expects to be able to store function pointers in void * or intptr_t, which is common. This would look unconventional, but it's ultimately just faithfully exposing the unconventional nature of the underlying platform.

aardappel commented 3 years ago

@sunfishcode I am aware it is possible to make them 32-bit in LLVM, I am just not following how this is not going to break a ton of C/C++ code (code that is maybe not legal, but never the less).

While by-value conversion to and from void * may work fine, any code that refers to a void * as part of a struct or array and expects that to be compatible with other such structs or arrays that differ only by pointer types is going to fall over. Code that wants to re-use / cast existing memory is common in C/C++ and could run into these issues.

I'm aware such code would likely not be strict aliasing safe either, but that probably holds for a lot of existing C/C++ code out there.

This could happen in some legacy library that someone is trying to port, and would be super hard to debug, and only happen when running under Wasm. These bugs would likely be entirely new to that code, since its never been built before for a platform that has different size pointers. I wouldn't think such issues are worth the savings for e.g. vtables.

SoniEx2 commented 1 year ago

does wasm64 want to retain POSIX compatibility?

dschuff commented 1 year ago

I don't think there's one single answer to that question. POSIX covers a lot of things (most of them not relevant to this particular issue) and in general we try to be compatible where it makes sense, but don't make huge compromises to get it. Most of those tradeoffs happen in e.g. emscripten. For this particular issue, I actually don't think (AFAIK anyway) POSIX says anything about the size of function pointers. Different ABIs on the same architecture can already have different C type sizes and layouts, but those ABIs are defined by documents such as the SYSV ABI (Linux) or Win32 ABI (Windows) on x86, or the ARM AAPCS spec. They don't necessarily match across architectures or operating systems.

SoniEx2 commented 1 year ago

we know there are a few places where POSIX actively assumes the memory and function pointer sizes are compatible; dlsym being one of the more obvious.

dschuff commented 1 year ago

Ah, I see what you mean. I think the general answer to your question is still less about conforming to POSIX per se, and more about balancing trying to work with as much existing code as possible, while having the best possible quality by other metrics, and dealing with the constraints of the platform. Emscripten's dynamic loading code is actually a pretty good example of this. We try to implement dlopen and dlsym, but the web environment is quite unlike the POSIX-y environments wrt code loading, since there may not be a filesystem (so we support URLs instead) and loading code is asynchronous on the main thread (so correct dlopen semantics are not always even possible).

wrt your point here, I think you are getting at a similar issue as @aardappel's last comment. In the case of dlsym I think casting from void to a function pointer and back should work ok as long as the underlying value is correct (i.e. within the table bounds) in the first place. It will be a size-changing cast (which the author likely did not expect) and there could be code that breaks when void is not the same size as function pointer, as @aardappel also pointed out. So here we are balancing the likelihood of breaking some code against the benefits such as the ones @sunfishcode mentioned (e.g. saving memory in structures such as vtables).

SoniEx2 commented 1 year ago

~~While the POSIX dlsym example code would still work due to a property of little-endian, it still doesn't feel right...~~ edit: no it wouldn't. you're writing 8 bytes (void*) to a 4-byte location (function pointer).

Personally, we know we treat them as having the same size when dealing with hexchat plugins, but we would never expect hexchat to run in a wasm environment anyway. Further, we wonder if it'd be feasible to have a native vtable format, instead of having to load vtables into wasm memory? That would be very useful. You could transparently compress those.

aardappel commented 1 year ago

@SoniEx2 Function pointers are implemented in LLVM as 64-bit for wasm64, so unless someone intends to reverse that design/implementation, you have the compatibility you're looking for.

dschuff commented 1 year ago

While the POSIX dlsym example code would still work due to a property of little-endian, it still doesn't feel right... edit: no it wouldn't. you're writing 8 bytes (void*) to a 4-byte location (function pointer).

If you mean writing the value into a memory location like a function-pointer struct member, then you'd first have to cast the void* to a function pointer (reducing its size) before writing it to memory. Of course in C it's possible do this in some other type-unsafe ways (which ends up being undefined behavior), which wouldn't work. Surely there would exist programs that don't do it right, and these would break. Technically if there is UB it's the program's problem and not the compiler's, but of course all else being equal it's still better not to break such programs than to break them.

Further, we wonder if it'd be feasible to have a native vtable format, instead of having to load vtables into wasm memory? That would be very useful. You could transparently compress those.

With the GC proposal you could build vtables out of GC structs or arrays. In the past we even kicked around the idea of trying to make clang lay out vtables as slices of the table rather than in memory (thus allowing method calls to use fewer loads). I think that might be technically possible but it would be a substantial change to clang, and there may be other showstoppers we haven't thought of.

SoniEx2 commented 1 year ago

This is an example from an older version of POSIX: https://pubs.opengroup.org/onlinepubs/009604499/functions/dlsym.html
This would be very bad if function pointers were 32 bits while memory pointers are 64 bits.

The example from the current version of POSIX would work correctly, however: https://pubs.opengroup.org/onlinepubs/9699919799/functions/dlsym.html

The only other discussion we can find related to this is in the gcc mailing list: https://www.mail-archive.com/gcc@gcc.gnu.org/msg93399.html

sbc100 commented 1 month ago

Now that #46 is resolved and we are getting 64-tables it seems like there is no good reason to pursue this anymore.

WebAssembly / memory64

[tool conventions] Should a function pointer be 64-bits? #2