Closed sunfishcode closed 4 years ago
As a heads-up, as I was writing this up, it occurred to me that renaming the argc/argv form would be a little nicer than renaming the no-arg form, but I forgot to adjust the title before submitting the pull request. I've now corrected the title.
Yes, that's right. It simplifies things to say that we just rename main
, rather than that we keep main
around and add a wrapper.
I also plan to submit a clang patch to do this rename in the frontend, instead of in the LLVM backend, to eliminate the wrapper, which has turned out to be a nuisance in practice.
In that case I'm not that happy about the part where we export a different name from the wasm module. That's more complicated than the current single export, and is user-visible and perhaps confusing.
I've now posted a clang patch which implements the __main_argc_argv
proposal here.
It's a name mangling rule, which means:
main
from other translation units, those callsites are also renamed to match.It's very simple. There's no generated wrapper function, linking works the same way for LTO and non-LTO, and libc code can easily detect whether to link in the command-line argument initialization code by defining its own main
function in libc.a:
__attribute__((weak))
int main(void) {
int argc = ...;
char **argv = ...;
return __main_argc_argv(argc, argv);
}
It also isn't that different from the __original_main
situation we have right now, where Binaryen is special-case inlining it. If you inline __main_argc_argv
into main
here, you'll end up with a function called main
containing the user main code, so backtraces etc. will look like you want them to.
Sorry @sunfishcode , I'm not sure whether your answer addresses or does not the topic of a different export name?
It doesn't change the _start
export. I'd ideally like to rename _start
also, but that's a bigger change we can talk about separately.
I apologize, maybe I wasn't clear before. I wasn't talking about _start
which I agree is a separate issue. What I'm asking about is what I asked about earlier too, that this proposal makes the wasm module export either main
or __main_argc_argv
. I wasn't sure if your last comment refers to this (as you mention other callsites etc. which I think may involve exports?).
Ah; in wasi-sdk, main
isn't exported from the final wasm binary. I guess in Emscripten it is exported, or it can be?
It is possible for a symbol name to differ from the wasm export name -- @sbc100's export attribute patch allows this, for example. So one possible option is to export __main_argc_argv
as main
at the wasm module level.
Yes, in non-standalone wasm, emscripten exports main
from the wasm binary. I wouldn't want to have a convention that where the export name can differ, so yeah, keeping the same export name as always sounds best.
I'm not sure what "entrypoint" means, in the text of this PR, if it isn't an export? Did we define it somewhere?
I've now rewritten the wording to give a definition of "entrypoint" and mention that export names may differ from ABI symbol names, and to hopefully be more clear overall. Let me know what you think!
Right now emscripten has two entry-point modes. The primary mode is where "main" is exported and called directly from JS after the constructors have been called. In this mode there is no _start
, only main
.
The second mode we what we call "standalone wasm mode" where _start
is exported and main is not.
I'm not sure how this change will effect the first/primary mode that emscripten uses, but in general it looks like a good change. We currently export main by passing --export=main
which is a linker flag (and which will work with wask-sdk too btw). I think this change as proposed might require some emscripten-side changes. Since we assume we can pass args to main
from JS and due to the JS calling conventions that works for 0-arg and 2-arg main both.
If we can move emscripten over to always use _start
then this problem would go away. Do you know of any reasons why that might be a bad idea @kripken ?
@sbc100 Emscripten changing from main
to _start
for the export to be called might be disruptive for existing users, that call main themselves directly. (Also, I do think main
is the better name!) So I'd prefer to not do that, but it looks like this PR isn't?
The current wording looks mostly reasonable to me, but I don't know the linking process well enough to say if there is any downside to this over the current __original_main
. @sbc100 do you think it would have problems with our standalone support in emscripten?
To support the status quo in emscripten we would need way to export a symbol called main
which takes args. I sounds like under this proposal --export=main
would result in the zero arg main being exported.
I'm not a fan of the current __original_main
mangling or the way its found its way into our crt1.c code in both wasi and emscripten, so if we can move away from that I would be happy.
As a possible path forward, I've now updated the clang patch which implements this: https://reviews.llvm.org/D70700 so that it checks the target, and doesn't do the mangling on Emscripten. That way, Emscripten is unaffected by this change.
@sbc100 @kripken Any update here? I need some solution to this problem in order to enable LTO builds of libc, and right now, compatibility with Emscripten appears to be the only thing holding this back.
I don't quite understand what that patch does (not familiar with that part of clang), but if it doesn't affect emscripten then I have no objection to it myself. (That seems separate from the discussion in this PR, but I don't think we have a clear answer here?)
In the LLVM patch, this change is disabled when the target triple OS is Emscripten. Also, @sbc100's comments above suggest there might be a reasonable path for implementing this in Emscripten when it's ready as well. With that, are there any remaining concerns about landing this and proceeding with the changes to LLVM, wasi-libc?
I support this change, but I'd be sad if can't find a way to make this work in emscripten too. I fairly certain that we can but I've not invested any time in it yet. I can you give me a little more time to prototype. Maybe by EOD Wednesday? If don't find any fundamental issues by then we can go ahead and land?
@sbc100 Sounds good to me.
It's now EOD on Wednesday; is there an update?
Did you verify that the debug name in the dwarf section is correct? I couldn't see the functions names in the output of llvm-dwarfdump --all
which means I'm probably doing something wrong.
I did confirm that the name section does have the mangled name in it, despite then name section supposedly containing the demangled names. This I believe is because the name section is generated by the linker by demanging the mangled names and I guess doesn't (yet) know how to demangle __main_argv_argv
back into main
.
The llvm-dwarfdump --all output contains the name "main" as the DW_AT_name
field for me (and "__main_argc_argv" in the DW_AT_linkage_name, as expected).
Confirmed about the name section; I propose the following patch, which fixes it:
diff --git a/lld/wasm/Symbols.cpp b/lld/wasm/Symbols.cpp
index 88e6f217c0d..0d4f7f648d5 100644
--- a/lld/wasm/Symbols.cpp
+++ b/lld/wasm/Symbols.cpp
@@ -29,6 +29,8 @@ std::string toString(const wasm::Symbol &sym) {
}
std::string maybeDemangleSymbol(StringRef name) {
+ if (name == "__main_argc_argv")
+ return "main";
if (wasm::config->demangle)
return demangleItanium(name);
return std::string(name);
OK, I've played with the patch in emscripten and I think we have a way forward. I'm going ahead with this and the llvm patch.
I have provisional patch for binaryen and emscripten so we should be able remove the emscripten exclusion from the llvm patch (either before or after landing it).
Change the ABI for the user entry point to rename
main
to__main_argc_argv
when it has argc/argv. This is needed because wasm requires caller and callee signatures to match, so the traditional trick of passing main arguments even if it doesn't expect them doesn't work.LLVM and related tools have been using the name
__original_main
for a similar purpose, but that's a confusing name.There's also a change here in that this is renaming the argc/argv form rather than renaming the nullary form. The choice is somewhat arbitrary, but I think it's slightly nicer to bias the aesthetics toward the no-argument form, because that's the smaller and simpler form.