What changed for proxying between 2.0.8 and 2.0.23?

jozefchutka commented 3 years ago

Hi, in emsdk 2.0.8 my emcc args inlcuded:

-s USE_PTHREADS=1
-s PROXY_TO_PTHREAD=1
-s MODULARIZE=1
-s EXPORTED_FUNCTIONS="[_main, _proxy_main]"

followed by javascript:

core.ccall('proxy_main', ...);

however building with 2.0.23 it complains emcc: error: undefined exported symbol: "_proxy_main" [-Wundefined] [-Werror]

I can remove the _proxy_main from the list, but then how to run main proxied?

jozefchutka commented 3 years ago

It seems I can ccall('emscripten_proxy_main'), is that the intended change?

kripken commented 3 years ago

I think proxy_main has always been an internal detail - you should not need to export it, or call it, or even be aware of it. Just setting PROXY_TO_PTHREAD will do everything for you automatically.

For examples, you can look in tests/*, which has many tests for PROXY_TO_PTHREAD, and it never mentions proxy_main.

jozefchutka commented 3 years ago

Hi @kripken ,

following Build FFmpeg WebAssembly version my compilation is using -s PROXY_TO_PTHREAD=1 however there is significant difference running:

core.ccall('proxy_main', ...); // 2.0.8
core._emscripten_proxy_main(...) // 2.0.23

vs.

core._main(...)

The difference is that if I do not call the proxy ones, the execution seems to happen on main thread. Anything I am doing wrong?

sbc100 commented 3 years ago

Normally one does not call main directly with emscripten, it gets called from callMain() which itself gets called from run() which itself is called automatically during startup.

See: https://github.com/emscripten-core/emscripten/blob/087ca39beeb6203ac838925f4a3e9c67623fb2fc/src/postamble.js#L118-L138

Because you are building with INVOKE_RUN=0 this does not happen in your case. Looking at the docs fro INVOKE_RUN to seem that it recommends that you call Module.callMain(): https://github.com/emscripten-core/emscripten/blob/087ca39beeb6203ac838925f4a3e9c67623fb2fc/src/settings.js#L85-L89

jozefchutka commented 3 years ago

Hi @sbc100 , thanks for the instructions, I am trying to follow, but my module has no callMain()

These are my make instructions https://github.com/jozefchutka/ffmpeg.wasm-core/blob/yscene/wasm/build-scripts/build-ffmpeg.sh . After loading and calling const module = await createFFmpegCore() the module has module._main(), module._emscripten_proxy_main(), module.run() etc. but there is not module.callMain() ...

I am still not sure how come it diverges from the suggested, do I need to explicitly expose callMain?

I will try to expose it and see if it works. Should I expect any different behavior calling callMain vs. _emscripten_proxy_main ?

sbc100 commented 3 years ago

I think you need to add -sEXPORTED_RUNTIME_METHODS=callMain.

jozefchutka commented 3 years ago

I have exported callMain() and being able to call it now. But sometimes when called multiple times in row it throws:

ErrnoError {node: undefined, errno: 44, message: "FS error", stack: "<generic error, no stack>", setErrno: ƒ}
errno: 44
message: "FS error"
node: undefined
setErrno: ƒ (errno)
stack: "<generic error, no stack>"

...which differs from calling _emscripten_proxy_main() directly. Calling _emscripten_proxy_main throws no exception

Any ideas?

jozefchutka commented 3 years ago

It seems it has to do something with how arguments are generated.

The algorithm in callMain does

var argc=args.length+1;
var argv=stackAlloc((argc+1)*4);
GROWABLE_HEAP_I32()[argv>>>2]=allocateUTF8OnStack(thisProgram);
for(var i=1;i<argc;i++){
    GROWABLE_HEAP_I32()[(argv>>2)+i>>>0]=allocateUTF8OnStack(args[i-1])
}
GROWABLE_HEAP_I32()[(argv>>2)+argc>>>0]=0;

while the one from ffmpeg article

 const args = ['ffmpeg', '-hide_banner'];
  const argsPtr = Module._malloc(args.length * Uint32Array.BYTES_PER_ELEMENT);
  args.forEach((s, idx) => {
    const buf = Module._malloc(s.length + 1);
    Module.writeAsciiToMemory(s, buf);
    Module.setValue(argsPtr + (Uint32Array.BYTES_PER_ELEMENT * idx), buf, 'i32');
  })

The latter seems more stable in my case.

I am considering to stick with emsdk 2.0.8, which:

seems more stable
allows me to re-call my main method while using EXIT_RUNTIME=1
nicely disposes workers when exit() is called

The 2.0.23 suffers from these 3 compared to 2.0.8

kripken commented 3 years ago

main() is meant to be called once, and so calling callMain multiple times is not supported. It is possible that calling main in an internal/direct way happens to work, but that is not safe in general - it depends on how things are set up in the startup process.

I'd recommend avoiding calling main multiple times, and instead export a function for that purpose, and call it as many times as you need.

(The "dispose of workers when exit" issue seems separate, and may be worth filing a bug if you see an issue there?)

jozefchutka commented 3 years ago

Hi @kripken ,

I managed to export additional function notmain , ffmpeg does good job reinitializing everything necessary inside main(), so the code looks like:

int notmain(int argc, char **argv)
{
    return main(argc, argv);
}

Can you please point me to the right direction or documentation in order to be able to call it multiple times via pthreads.

Whats the proper way of calling my notmain from javascript? I can see Module._notmain but not Module._emscripten_proxy_notmain and I need the executions in pthreds/workers
How to properly pass arguments? Which method should I use https://github.com/emscripten-core/emscripten/issues/14312#issuecomment-853700548 and is there anything available in Module I could call instead of rebuilding this algorithm myself?
I find it hard to follow docs when it comes to EXIT_RUNTIME=1, my intention is to call nomain() multiple times on the same Module instance, as each execution creates file in filesystem etc. However later at some point I want to exit the Module and make sure there are no leaks.

I have seen some realted activity on https://github.com/emscripten-core/emscripten/pull/14367 , thanks for that. I would love to see more docs or any solid instructions no how to proceed when main() (or any other) is to be called multiple times.

kripken commented 3 years ago

notmain calling main is also not valid, I think. main is a very special function in the C world - it's called automatically by the runtime for you, and exactly once.

It sounds like your codebase wants to actually run main multiple times - perhaps using MODULARIZE and creating a new instance for each invocation is the right thing?

sbc100 commented 3 years ago

Although calling main mulitple times is technically not a good idea, I think in this case is probably OK.

You could solve the both issues by modifying the code or using -Dmain=notmain on the command line? That way main no longer exists and you would just export _notmain. This would avoid the wrapper function and avoid the "calling main more than once issue".

The important thing is the main all the static constructors and startup code in emscripten's run() method is called just once. (i.e. just like we are using the module as a library).

sbc100 commented 3 years ago

Whats the proper way of calling my notmain from javascript? I can see Module._notmain but not Module._emscripten_proxy_notmain and I need the executions in pthreds/workers

You probably want to create you own wrapper that creates the thread just retuns. PROXY_TO_PTHREAD only really works for programs with a single main that is called once. So I this case you don't want PROXY_TO_PTHREAD but would need to roll your own runner. The code is pretty simple:

https://github.com/emscripten-core/emscripten/blob/bdc97fe0de1daf9e66aea4fa1d30b3dc3bd907c3/system/lib/pthread/library_pthread.c#L935-L951

Alternatively you could create your worker in JS and use postmessage to sent the work over to it.

How to properly pass arguments? Which method should I use #14312 (comment) and is there anything available in Module I could call instead of rebuilding this algorithm myself?

The code in that comment looks reasonable assuming you need to pass strings. You can write simple wrapper that takes JS strings and converts them to an array of string.

BTW, often times when building a library its easier to avoid the generic argv-style argument passing and use something more specific.

I find it hard to follow docs when it comes to EXIT_RUNTIME=1, my intention is to call nomain() multiple times on the same Module instance, as each execution creates file in filesystem etc. However later at some point I want to exit the Module and make sure there are no leaks.

You can set EXIT_RUNTIME=1 and it should do what you want. The module will stay alive until someone calls exit(). One issue to be aware of is that if any of your calls to notmain ever call exit (or, for example, abort or assert) the module will then becoming technically unusable since it will "exit" and run any static desctructors. (This might not matter for your app since it C and not C++, but technically you should not call the module after its exit'd and we have asserts in debug builds that let you know if you do this).

jozefchutka commented 3 years ago

Hi @kripken , @sbc100 thanks for your valuable instructions. I think I have a solid starting point now. It would be also nice to have this in documentation so it can help more people struggling with the same.

emscripten-core / emscripten

What changed for proxying between 2.0.8 and 2.0.23? #14312