WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.45k stars 737 forks source link

operator new param? #294

Closed kripken closed 8 years ago

kripken commented 8 years ago

In emscripten's libc++ port, operator new takes a size_t (unsigned int). However, when using the wasm backend, clang is not pleased, and says "no, that must be a size_t (unsigned long)",

error: 'operator new' takes type size_t ('unsigned long') as first parameter

This blocks many (all?) tests using c++ STL stuff.

What's going on here?

cc @sunfishcode @jfbastien @dschuff

jfbastien commented 8 years ago

It should be std::size_t, but otherwise I'd expect it to work as long as you've included cstddef or another one of the headers that declares std::size_t.

Repro instructions?

kripken commented 8 years ago

@jfbastien: STR is e.g. emcc tests/hello_libcxx.cpp in emscripten (latest incoming branch, and with vanilla llvm set up as your llvm), which builds this file:

#include <iostream>
int main() {
  std::cout << "hello, world!" << std::endl;
}
sunfishcode commented 8 years ago

Clang has a builtin notion of what type "size_t" is a typedef for. For the wasm target, it's currently set to unsigned long. One of emscripten's headers is defining size_t to be unsigned int. The two types are effectively interchangeable in terms of the generated code, but in terms of C++ type checking they have to match.

I chose unsigned long in clang because the wasm32 target is ILP32, while wasm64 is LP64, so unsigned long is 32-bit on wasm32 and 64-bit on wasm64, so using unsigned long for both means each target gets the size it needs, while using the same type. It isn't necessary that we do this, but it is convenient that, for example, operator new mangles to _Znwm on both (instead of having _Znwj on wasm32).

Looking more closely, it looks like it's system/lib/libc/musl/arch/emscripten/bits/alltypes.h, with

#define _Addr int

and

typedef unsigned _Addr size_t;

This might be fixable by adding an #ifdef __wasm__ and #define _Addr long in alltypes.h, though I haven't actually tried it.

kripken commented 8 years ago

Thanks for the info. So, what are the benefits and downsides to fixing this in the musl headers vs fixing this in the wasm backend?

sunfishcode commented 8 years ago

Benefits to fixing it in the musl headers:

Benefits to fixing it in clang:

kripken commented 8 years ago

An additional benefit to fixing it in clang might be

sunfishcode commented 8 years ago

A simpler header fix than what I suggested above would be to define _Addr to be __INTPTR_TYPE__, a predefined macro in clang (and gcc and others). That avoids the need for an ifdef.

I do see that musl has a consistency for using unsigned int on 32-bit platforms, but I don't see what the advantage of the consistency is. The vast majority of musl code is either portable and can't depend on the type of size_t anyway, or fully platform-specific and can but doesn't affect us.

User code depending on size_t being unsigned int is already not very portable. And, it is common to write tools such as ABI checkers which work in terms of symbol names, and avoiding gratuitous churn between wasm32 and wasm64 seems convenient, since wasm32 and wasm64 should ideally be ABI-compatible in every non-essential way.

jfbastien commented 8 years ago

Odd, musl seems self-consistent but on its own compared to other places. newlib uses __SIZE_TYPE__ which is provided by the compiler, and both clang and gcc's provided stddef.h does typedef __SIZE_TYPE__ size_t;.

It would be useful to ask on the musl list why things are done this way.

sunfishcode commented 8 years ago

It happens that I did ask the musl list a while ago; the concerns were: using non-standard extensions require clang or gcc or a compatible compiler, using compiler macros means that musl's headers aren't the place to look to find out what something is, having musl's headers be explicit is more likely to expose problems where the compiler's ABI doesn't match musl's ABI, and one person asserted that C compilers shouldn't need to know these types.

I find these reasons to be less applicable in wasm's situation, but don't feel strongly about it. I'd be ok with either the ifdef approach described above, or using __INTPTR_TYPE__ and similar macros.

jfbastien commented 8 years ago

This thread? http://www.openwall.com/lists/musl/2016/01/25/10

I would rather go for the "non-odd" thing in this context, i.e. musl's approach. We could also consider a different libc where the approach is more in-line with LLVM / GCC, but clearly musl's approach works for multiple other architectures so I'm not super thrilled to move away from it.

One big downside if we diverge is that it'll basically be impossible to upstream changes.

sunfishcode commented 8 years ago

Yes. I was encouraged by this reply: http://www.openwall.com/lists/musl/2016/01/26/8

I claim that I'm familiar with the toolchain work needed to support this, and with historical C code with invalid assumptions, and am willing to pay the price. If we ever have glibc/newlib/bionic/etc. ports, there are real advantages to making them ABI-compatible at the freestanding level, even if that means not following some conventions about what size_t "should" be in various places.

jfbastien commented 8 years ago

I claim that I'm familiar with the toolchain work needed to support this, and with historical C code with invalid assumptions, and am willing to pay the price. If we ever have glibc/newlib/bionic/etc. ports, there are real advantages to making them ABI-compatible at the freestanding level, even if that means not following some conventions about what size_t "should" be in various places.

I'm not sure I understand what you're advocating for.

Maybe upstream musl would be OK with include/alltypes.h.in containing:

#if defined(__SIZE_T__)
#define _Addr __SIZE_T__
#endif

?

It would keep the per-arch _Addr for compilers without __SIZE_T__ while honoring the compiler's guidance when it does. That does cause the same ABI issue if musl is used with different compilers, though.

kripken commented 8 years ago

It seems that there are two issues here. Compatibility risk with code not working, which @sunfishcode says he can handle, and upstreamability risk with us not doing the standard musl thing.

In both cases, the safe route is to do the stuff musl does in all other backends, and modify the wasm backend. That avoids both types of risk.

If we do want to take the riskier path, let's at least get rid of the upstreamability risk now. Because if they don't say "sure, we'd accept such a new backend even with it differing from the others on that aspect", then we might end up with them saying "no" later, and later it would be harder to change the wasm backend.

sunfishcode commented 8 years ago

I asked the musl developers, and they seem ok with the approach of defining _Addr to be long even on 32-bit platforms. See the thread here.

Consequently, I support resolving this as:

#ifdef __wasm__
#define _Addr long
#else
#define _Addr int
#endif

The reason for the ifdef is to continue to also support the current asm.js ABI. An upstream submission would presumably want to switch from arch/emscripten to arch/wasm or so, at which time it might also make sense to only include the code for the wasm ABI.

kripken commented 8 years ago

Thanks, and that fix sounds good to me.

Not sure though about arch/wasm in the future, for reasons discussed in the past.

jfbastien commented 8 years ago

Awesome, thanks for driving this.

I'd like arch/wasm, but we can discuss separately.

kripken commented 8 years ago

Fixed size_t in emscripten in https://github.com/kripken/emscripten/commit/72856ea6c255da6a9f56343522d92bea69859fa2

Now compiling hello_libcxx.cpp (hello world with iostream) crashes in s2wasm, on the line with .int32 _GLOBAL__I_000101@FUNCTION (which surprises it), https://gist.githubusercontent.com/kripken/5a2e32e4e1aa10d206136204583f23ac/raw/01a7ce032a582b99be8c52af5b83e56858b6d2e5/gistfile1.txt

I don't really understand what the sections etc. mean in .s files, so I have no idea what's going wrong there.

dschuff commented 8 years ago

Interesting. If I manually do something like this in a C file: int myfunc(void) __attribute__((section(".text.__startup"))); it doesn't complain. I'll take a closer look

dschuff commented 8 years ago

So this also reproduces if you have a function marked __attribute__((constructor)). It's because it emits a bare .int32 into the .init_array section, i.e.

        .section        .init_array,"aw",@init_array
        .p2align        2
        .int32  myfunc@FUNCTION

Whereas s2wasm always expects directives like .int32 to be part of an object marked with a .type directive, e.g.

        .type   .L.str,@object          # @.str
        .section        .rodata.str1.1,"aMS",@progbits,1
.L.str:
        .asciz  "Hello, world!\n"
        .size   .L.str, 15

Constructors that go into .init_array are just pointers to functions, but are not objects with names. Probably we should just add support for raw emission into sections like that.

kripken commented 8 years ago

Makes sense, yeah. I opened a new issue for ctors specifically, #306, as this one was about new() and I guess we can close it following the size_t fix.

sunfishcode commented 6 years ago

This patch happened, which changed clang's definition, but it looks like you're right that emscripten's alltypes.in was not updated with it.

FWIW, I am actually preparing patches to change it back now, with a clang patch here and a corresponding Emscripten patch I expect to post soon.

vvuk commented 6 years ago

Thanks -- deleted my comment, because my actual bug was that I had a stray -m64 that snuck in to my compile args, which was selecting the wasm64 target. That was what was actually causing my problem.

FrancoisChastel commented 5 years ago

Hello, does this issue is really closed ? Seem like I face the same issue when trying to compile raknet using emscripten :

In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/Router2.cpp:14:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/Router2.h:26:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/UDPForwarder.h:27:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/SimpleMutex.h:20:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/RakMemoryOverride.h:22:
external/emscripten_toolchain/system/include/libcxx/new:181:66: error: 'operator new' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new(std::size_t __sz) _THROW_BAD_ALLOC;
                                                                 ^
external/emscripten_toolchain/system/include/libcxx/new:182:66: error: 'operator new' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new(std::size_t __sz, const std::nothrow_t&) _NOEXCEPT _NOALIAS;
                                                                 ^
external/emscripten_toolchain/system/include/libcxx/new:189:66: error: 'operator new[]' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new[](std::size_t __sz) _THROW_BAD_ALLOC;
                                                                 ^
external/emscripten_toolchain/system/include/libcxx/new:190:66: error: 'operator new[]' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new[](std::size_t __sz, const std::nothrow_t&) _NOEXCEPT _NOALIAS;
                                                                 ^
external/emscripten_toolchain/system/include/libcxx/new:215:70: error: 'operator new' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 inline _LIBCPP_INLINE_VISIBILITY void* operator new  (std::size_t, void* __p) _NOEXCEPT {return __p;}
                                                                     ^
external/emscripten_toolchain/system/include/libcxx/new:216:70: error: 'operator new[]' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 inline _LIBCPP_INLINE_VISIBILITY void* operator new[](std::size_t, void* __p) _NOEXCEPT {return __p;}

Which version of emscripten_toolchain / emscripten_clang fix this issue ?

FrancoisChastel commented 5 years ago

@kripken some clue ? :)

kripken commented 5 years ago

@FrancoisChastel you may be seeing a different issue, if you see that error on the latest release (1.38.13). Do you have a testcase showing the issue?