Closed kripken closed 8 years ago
It should be std::size_t
, but otherwise I'd expect it to work as long as you've included cstddef
or another one of the headers that declares std::size_t
.
Repro instructions?
@jfbastien: STR is e.g. emcc tests/hello_libcxx.cpp
in emscripten (latest incoming branch, and with vanilla llvm set up as your llvm), which builds this file:
#include <iostream>
int main() {
std::cout << "hello, world!" << std::endl;
}
Clang has a builtin notion of what type "size_t" is a typedef for. For the wasm target, it's currently set to unsigned long
. One of emscripten's headers is defining size_t
to be unsigned int
. The two types are effectively interchangeable in terms of the generated code, but in terms of C++ type checking they have to match.
I chose unsigned long
in clang because the wasm32 target is ILP32, while wasm64 is LP64, so unsigned long
is 32-bit on wasm32 and 64-bit on wasm64, so using unsigned long
for both means each target gets the size it needs, while using the same type. It isn't necessary that we do this, but it is convenient that, for example, operator new mangles to _Znwm
on both (instead of having _Znwj
on wasm32).
Looking more closely, it looks like it's system/lib/libc/musl/arch/emscripten/bits/alltypes.h, with
#define _Addr int
and
typedef unsigned _Addr size_t;
This might be fixable by adding an #ifdef __wasm__
and #define _Addr long
in alltypes.h, though I haven't actually tried it.
Thanks for the info. So, what are the benefits and downsides to fixing this in the musl headers vs fixing this in the wasm backend?
Benefits to fixing it in the musl headers:
Benefits to fixing it in clang:
An additional benefit to fixing it in clang might be
A simpler header fix than what I suggested above would be to define _Addr
to be __INTPTR_TYPE__
, a predefined macro in clang (and gcc and others). That avoids the need for an ifdef.
I do see that musl has a consistency for using unsigned int
on 32-bit platforms, but I don't see what the advantage of the consistency is. The vast majority of musl code is either portable and can't depend on the type of size_t
anyway, or fully platform-specific and can but doesn't affect us.
User code depending on size_t being unsigned int
is already not very portable. And, it is common to write tools such as ABI checkers which work in terms of symbol names, and avoiding gratuitous churn between wasm32 and wasm64 seems convenient, since wasm32 and wasm64 should ideally be ABI-compatible in every non-essential way.
Odd, musl seems self-consistent but on its own compared to other places. newlib uses __SIZE_TYPE__
which is provided by the compiler, and both clang and gcc's provided stddef.h
does typedef __SIZE_TYPE__ size_t;
.
It would be useful to ask on the musl list why things are done this way.
It happens that I did ask the musl list a while ago; the concerns were: using non-standard extensions require clang or gcc or a compatible compiler, using compiler macros means that musl's headers aren't the place to look to find out what something is, having musl's headers be explicit is more likely to expose problems where the compiler's ABI doesn't match musl's ABI, and one person asserted that C compilers shouldn't need to know these types.
I find these reasons to be less applicable in wasm's situation, but don't feel strongly about it. I'd be ok with either the ifdef approach described above, or using __INTPTR_TYPE__
and similar macros.
This thread? http://www.openwall.com/lists/musl/2016/01/25/10
I would rather go for the "non-odd" thing in this context, i.e. musl's approach. We could also consider a different libc where the approach is more in-line with LLVM / GCC, but clearly musl's approach works for multiple other architectures so I'm not super thrilled to move away from it.
One big downside if we diverge is that it'll basically be impossible to upstream changes.
Yes. I was encouraged by this reply: http://www.openwall.com/lists/musl/2016/01/26/8
I claim that I'm familiar with the toolchain work needed to support this, and with historical C code with invalid assumptions, and am willing to pay the price. If we ever have glibc/newlib/bionic/etc. ports, there are real advantages to making them ABI-compatible at the freestanding level, even if that means not following some conventions about what size_t "should" be in various places.
I claim that I'm familiar with the toolchain work needed to support this, and with historical C code with invalid assumptions, and am willing to pay the price. If we ever have glibc/newlib/bionic/etc. ports, there are real advantages to making them ABI-compatible at the freestanding level, even if that means not following some conventions about what size_t "should" be in various places.
I'm not sure I understand what you're advocating for.
Maybe upstream musl would be OK with include/alltypes.h.in
containing:
#if defined(__SIZE_T__)
#define _Addr __SIZE_T__
#endif
?
It would keep the per-arch _Addr
for compilers without __SIZE_T__
while honoring the compiler's guidance when it does. That does cause the same ABI issue if musl is used with different compilers, though.
It seems that there are two issues here. Compatibility risk with code not working, which @sunfishcode says he can handle, and upstreamability risk with us not doing the standard musl thing.
In both cases, the safe route is to do the stuff musl does in all other backends, and modify the wasm backend. That avoids both types of risk.
If we do want to take the riskier path, let's at least get rid of the upstreamability risk now. Because if they don't say "sure, we'd accept such a new backend even with it differing from the others on that aspect", then we might end up with them saying "no" later, and later it would be harder to change the wasm backend.
I asked the musl developers, and they seem ok with the approach of defining _Addr to be long
even on 32-bit platforms. See the thread here.
Consequently, I support resolving this as:
#ifdef __wasm__
#define _Addr long
#else
#define _Addr int
#endif
The reason for the ifdef is to continue to also support the current asm.js ABI. An upstream submission would presumably want to switch from arch/emscripten to arch/wasm or so, at which time it might also make sense to only include the code for the wasm ABI.
Thanks, and that fix sounds good to me.
Not sure though about arch/wasm in the future, for reasons discussed in the past.
Awesome, thanks for driving this.
I'd like arch/wasm, but we can discuss separately.
Fixed size_t
in emscripten in https://github.com/kripken/emscripten/commit/72856ea6c255da6a9f56343522d92bea69859fa2
Now compiling hello_libcxx.cpp (hello world with iostream) crashes in s2wasm, on the line with .int32 _GLOBAL__I_000101@FUNCTION
(which surprises it), https://gist.githubusercontent.com/kripken/5a2e32e4e1aa10d206136204583f23ac/raw/01a7ce032a582b99be8c52af5b83e56858b6d2e5/gistfile1.txt
I don't really understand what the sections etc. mean in .s files, so I have no idea what's going wrong there.
Interesting. If I manually do something like this in a C file:
int myfunc(void) __attribute__((section(".text.__startup")));
it doesn't complain. I'll take a closer look
So this also reproduces if you have a function marked __attribute__((constructor))
.
It's because it emits a bare .int32
into the .init_array
section, i.e.
.section .init_array,"aw",@init_array
.p2align 2
.int32 myfunc@FUNCTION
Whereas s2wasm always expects directives like .int32
to be part of an object marked with a .type
directive, e.g.
.type .L.str,@object # @.str
.section .rodata.str1.1,"aMS",@progbits,1
.L.str:
.asciz "Hello, world!\n"
.size .L.str, 15
Constructors that go into .init_array
are just pointers to functions, but are not objects with names. Probably we should just add support for raw emission into sections like that.
Makes sense, yeah. I opened a new issue for ctors specifically, #306, as this one was about new() and I guess we can close it following the size_t fix.
This patch happened, which changed clang's definition, but it looks like you're right that emscripten's alltypes.in was not updated with it.
FWIW, I am actually preparing patches to change it back now, with a clang patch here and a corresponding Emscripten patch I expect to post soon.
Thanks -- deleted my comment, because my actual bug was that I had a stray -m64
that snuck in to my compile args, which was selecting the wasm64
target. That was what was actually causing my problem.
Hello, does this issue is really closed ? Seem like I face the same issue when trying to compile raknet
using emscripten :
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/Router2.cpp:14:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/Router2.h:26:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/UDPForwarder.h:27:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/SimpleMutex.h:20:
In file included from /private/var/tmp/_bazel_root/7eac25f81ba1cb3897eb750dd5c677da/external/raknet/Source/RakMemoryOverride.h:22:
external/emscripten_toolchain/system/include/libcxx/new:181:66: error: 'operator new' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new(std::size_t __sz) _THROW_BAD_ALLOC;
^
external/emscripten_toolchain/system/include/libcxx/new:182:66: error: 'operator new' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new(std::size_t __sz, const std::nothrow_t&) _NOEXCEPT _NOALIAS;
^
external/emscripten_toolchain/system/include/libcxx/new:189:66: error: 'operator new[]' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new[](std::size_t __sz) _THROW_BAD_ALLOC;
^
external/emscripten_toolchain/system/include/libcxx/new:190:66: error: 'operator new[]' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 _LIBCPP_OVERRIDABLE_FUNC_VIS void* operator new[](std::size_t __sz, const std::nothrow_t&) _NOEXCEPT _NOALIAS;
^
external/emscripten_toolchain/system/include/libcxx/new:215:70: error: 'operator new' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 inline _LIBCPP_INLINE_VISIBILITY void* operator new (std::size_t, void* __p) _NOEXCEPT {return __p;}
^
external/emscripten_toolchain/system/include/libcxx/new:216:70: error: 'operator new[]' takes type size_t ('unsigned long') as first parameter
_LIBCPP_NODISCARD_AFTER_CXX17 inline _LIBCPP_INLINE_VISIBILITY void* operator new[](std::size_t, void* __p) _NOEXCEPT {return __p;}
Which version of emscripten_toolchain / emscripten_clang fix this issue ?
@kripken some clue ? :)
@FrancoisChastel you may be seeing a different issue, if you see that error on the latest release (1.38.13). Do you have a testcase showing the issue?
In emscripten's libc++ port, operator new takes a
size_t (unsigned int)
. However, when using the wasm backend, clang is not pleased, and says "no, that must be asize_t (unsigned long)
",This blocks many (all?) tests using c++ STL stuff.
What's going on here?
cc @sunfishcode @jfbastien @dschuff