emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.81k stars 3.31k forks source link

Questions about Module.onExit #14940

Open jhirschberg70 opened 3 years ago

jhirschberg70 commented 3 years ago

I see that there's an open issue regarding Module.onExit not being documented #6451. Is this simply an oversight or is there the possibility of Module.onExit going away?

I'm operating under the assumption that Module.onExit isn't going away, so my second question is about usage. I am building with -s ASYNCIFY and EXIT_RUNTIME=1 and using IDBFS for persistent storage. I have implemented the following:

Module["onExit"] = async function() {
  await IO.closeFiles()
          .catch(err => { console.log("Error closing files"); });
}

The IO.closeFiles() referenced above has the following implementation:

 function closeFiles() {
    _fsSynced = false;

    return new Promise(
      function (resolve, reject) {
        FS.syncfs(false, function (err) {
          if (err) {
            reject(err);
          }
          else {
            _fsSynced = true;
            resolve(_fsSynced);
          }
        });
    });
 }

What I'm trying to accomplish is a final sync of the filesystem when my C++ code exits. My initial solution was to just initiate FS.syncfs from within the C++ code with EM_ASM and wait for the sync to complete using a loop and emscripten_sleep before exiting. That worked fine, but I'm using an open source code base, and I'm trying to avoid modifying files. I figured if there was a reliable way to sync from javascript after the C++ had exited, then I could avoid having to change the C++ source. The above code seems to be working, but I want to make sure it's reliable. I'm worried about a potential race condition and the possibility of data being destroyed before the FS.syncfs completes. Can I rely on this implementation or is there a better way to accomplish what I want?

sbc100 commented 3 years ago

I don't believe there are any plans to remove the onExit callback.

A couple of notes though:

Firstly you might want to consider instead using atexit handlers (since they run only on normal shutdown rather that, for example, on abort/assert when the program can be in an undefined state). To register and atexit handler from JS you can call Module.addOnExit() at runtime or you can addAtExit from at build time in your library JS file. e.g: https://github.com/emscripten-core/emscripten/blob/9a644e1a878863b9dc168401174c9f5c775e8c1b/src/library_fs.js#L35

Which leads me the next question, is this filesystem syncing on exit something you need to do for you custom filesystem or are you just using emscripten filesystem, in which case it would be bug if they did not sync on exit.

jhirschberg70 commented 3 years ago

Thanks for responding so quickly!

For a long time I wasn't using EXIT_RUNTIME=1, so atexit wouldn't work (if I understand correctly). I'm okay with using EXIT_RUNTIME=1, so I can try using atexit. Thank you.

I'm using the emscripten filesystem with IDBFS. The C++ is saving a couple of files prior to exit. If I understand correctly, the C++ is writing to memory when saving these files, and I need to do FS.syncfs to get the data copied from memory into IDB. I'm just looking for a way, if possible, to do this reliably after the C++ has exited, so I don't have to modify the C++ to do the sync.

sbc100 commented 3 years ago

All the filesystems (including IDBFS) should sync on exit:

https://github.com/emscripten-core/emscripten/blob/246209c1dcbcce18a57788d165d891f40133c60f/src/library_fs.js#L1498-L1511

sbc100 commented 3 years ago

Actually it looks like syncfs is not called as part of quit... I wonder why. Maybe @juj or @kripken might know?

jhirschberg70 commented 3 years ago

I tried using addAtExit and Module["addOnExit"] and neither worked for me. addAtExit was undefined when I ran my code, so I likely did something wrong there. The function I defined for Module["addOnExit"] never got called. So far, the only thing that seems to be working is the method I outlined originally. Are there particular compiler flags I need to use/not use in order to make addAtExit or Module["addOnExit"] work?

sbc100 commented 3 years ago

When are you trying to call addOnExit? In general you can't call such methods during --pre-js. Try doing it a --post-js maybe?

sbc100 commented 3 years ago

In --post-js I think you can just call addOnExit (no need to export it on the Module object or call it there).

jhirschberg70 commented 3 years ago

I'm a little confused by addAtExit vs. addOnExit. You included a link to addAtExit being used in library_fs.js. You mentioned calling addOnExit in your most recent comment. Are they both supposed to be available and equivalent?

As for using --pre-js vs. --post-js, I believe I tried addAtExit and Module["addOnExit"] with both --pre-js and --post-js, and nothing worked. I don't have access to my code right now, so I can't confirm that tried all the permutations, but I'm pretty confident I did. I've definitely tried Module["onExit"] with both --pre-js and --post-js, and both ways worked.

sbc100 commented 3 years ago

I agree its a little confusing. I don't know why the names differ.

addAtExit only exists during JS library processing, it cannot be used at runtime which means it cannot be used in --pre-js or --post-js.

addOnExit is runtime function defined in src/preamble.js. This means you can call it directly from --post-js but I won't exist in --pre-js since pre-js is included before src/preamble.js. You don't need to Module["addOnExit"] and indeed addOnExit won't be defined on the module unless you do also add it to EXPORTED_FUNCTIONS.

At least that is my understanding of of how things work right now :)

You can obviously keep using Module['onExit'] too.. but just note that unlike addOnExit it gets called not just on normal program exit but also if the program crashes/aborts/asserts.

jhirschberg70 commented 3 years ago

Thanks for the clarification on addAtExit and addOnExit.

Regardless of whether I continue using the code I have or switch to using addOnExit (I can see arguments for both) my overall question remains: since FS.syncfs is asynchronous and I have EXIT_RUNTIME=1 in my build, is it guaranteed that the sync will complete before data in memory is destroyed and the filesystem quits? You pointed out yourself that library_fs.js is calling addAtExit('FS.quit();');. I'm just worried that there's a race between my code and the emscripten code, and I've just been getting lucky so far.

Thanks for all your help!

sbc100 commented 3 years ago

I don't really know what FS.quit() is doing, but its hard to imagine that the data in memory will be lost. As long as someone has a reference to the data it will not get GC'd by the JS engine and presumably the syncfs process holds a reference to the data it is sync'ing so it probably fine, but you might want to get a deeper understanding of the code to confirm.

jhirschberg70 commented 3 years ago

What you say about a reference to the data and garbage collection makes sense, but since this involves code I didn't write and don't fully understand, it makes me wary. It sounds like digging deeper into the code may be in my future! Thanks for all the help!

juj commented 3 years ago

Actually it looks like syncfs is not called as part of quit... I wonder why. Maybe @juj or @kripken might know?

IDBFS syncs after each file write operation, because otherwise there would be common data loss if syncing (only) at quit time. Then if one is syncing after each write, that makes syncing at quit redundant.

jhirschberg70 commented 3 years ago

I'm very confused. Here's a simplified pseudo version of the flow I've been using in my code:

FS.mkdir(root);
FS.mount(IDBFS, {}, root);
FS.syncfs(true, (err) => { assert(!err); });
Wait for FS.syncfs to complete
callMain()
onExit FS.syncfs(false, (err) => { assert(!err); });

Are you saying I shouldn't have to make the second FS.syncfs call, that it should be happening automatically as my C/C++ writes files?

juj commented 3 years ago

err sorry - actually I miswrote. In the recent application that I have been working on we have been syncing after each write, so I misthought that it would be built-in behavior, but naturally it is not.

Recalling this with more attention: in general the filesystem runtime has not made decisions (on behalf of the user) on whether the data in an IDBFS mount is immutable over page reloads (data stored in IndexedDB is read-only, never persists back) or read-write (writes persist back). So if one only read-mounts an IDBFS mount, they can regard it as an immutable read-only mount, or they can choose to write changes back to the storage.

Automatically persisting on exit (or after writes for that matter) would prevent the immutable/read-only use case where an app will only do an initial population to IDBFS, but then want to reload the original set of data on each page reload. That is why syncing back is an explicit operation that the user can choose when to initiate.

jhirschberg70 commented 3 years ago

Thanks for clarifying that. It sounds like my basic understanding of how things work is correct. In my particular case, the data is never immutable, so I will always want to sync when the C/C++ exits. I just want to make sure that if I'm using Module["onExit"] or addOnExit() to initiate that final sync that the data in memory that I'm copying back to IDB won't be changed/deleted before the sync completes. This seems unlikely with how javascript garbage collection works, but there's plenty about javascript I still don't know, and I don't know everything emscripten does when the runtime exits. I just want to know that using Module["onExit"] or addOnExit() to initiate that final sync is going to be reliable.

sbc100 commented 3 years ago

If you want persistence then perhaps it would be better to sync right away rather than only onExit? I wonder if we could/should add a mount option what would sync every time a file is closed? Seems like it would be nice if users could get a persistent IDBFS mount without having to write JS code.

jhirschberg70 commented 3 years ago

What you suggest about syncing when files are closed would certainly work, and having the option to do that automatically could be nice (leaving out automatic syncing on files that are opened read-only). In my particular case, I don't want to do this myself, as I don't want to alter the C++ source if I can avoid it, so I'm fine with initiating the sync in javascript. I've put my code below for anybody looking for an example of how to initialize the filesystem and then do the final sync. I've included only the relevant portions of IO. I use the --pre-js and EXIT_RUNTIME=1 options at build. _fsSynced is not strictly necessary. It's a holdover from how I used to do things before switching to using Promises. I don't need to use a Promise with IO.closeFiles(), either. I did it to provide a consistent interface between IO.initFiles() (where the Promise makes for a cleaner implementation than my old implementation of looping to check if _fsSynced is true) and IO.closeFiles().

var IO = (function() {
 var _fsSynced = false;
...

  function closeFiles() {
    _fsSynced = false;

    return new Promise((resolve, reject) => {
      FS.syncfs(false, (err) => {
        if (err) {
          reject(err);
        }
        else {
          _fsSynced = true;
          resolve(_fsSynced);
        }
      });
    });
  }

  function initFiles(path) {
    _fsSynced = false;
    FS.mkdir(path);
    FS.mount(IDBFS, {}, path);

    return new Promise((resolve, reject) => {
      FS.syncfs(true, (err) => {
        if (err) {
          reject(err);
        }
        else {
          _fsSynced = true;
          resolve(_fsSynced);
        }
      });
    });
  }

  return {
    closeFiles:closeFiles,
    initFiles:initFiles
  }
})();

Module["onExit"] = async () => {
  await IO.closeFiles()
          .catch(err => { console.log("Error closing files"); });
}

Module["onRuntimeInitialized"] = async () => {
  let root = "/root";
  await IO.initFiles(root)
          .catch(err => { console.log("Error initializing files"); });
  callMain(args);
}
jhirschberg70 commented 3 years ago

Is there an equivalent of Module["onExit"] for when main returns and exit isn't called?

sbc100 commented 3 years ago

onExit should be called in both cases yes

sbc100 commented 3 years ago

If you are not seeing onExit being called on return from main that would be bug.

jhirschberg70 commented 3 years ago

Let me double-check that onExit isn't being called on returning from main in my code. I could've done something boneheaded. I won't be able to confirm until tonight. Thanks!

sbc100 commented 3 years ago

I think you need to set EXIT_RUNTIME in order to get the onExit to fire.. but that should be true if call exit() directly or if you return from main.

Please make sure you test with the latest version of emscripten since the exit behavior is something I've bee iterating on recently.

jhirschberg70 commented 3 years ago

Yes, you need to set EXIT_RUNTIME for onExit to fire. At least that's been my experience, and I should be doing that. I'll double-check that, as well, and I'll make sure to update to the latest emscripten when I test again. I updated to 2.0.27 to recently, so I don't know if there's anything newer than that.

jhirschberg70 commented 3 years ago

Okay, I've had some time to play with this. I updated emsdk to the latest release (not tip of tree). I wrote a very simple test with main just immediately returning 0, and onExit fired, as expected. My actual code, however, won't fire onExit unless I explicitly call exit. I whittled this down to the barebones case that would fail, and I believe the problem comes from EM_ASYNC_JS. Here's all my code:

em_async_js.zip

Run build-debug.sh to build everything and then load up em_async.html in a browser and look at the console. The code defines a keydown listener to push keyCodes into a buffer whenever the user presses a key. An asynchronous function defined with an EM_ASYNC_JS macro can be used to retrieve the keyCodes of the keys pressed. I've defined three main functions in main.cpp that can be commented/uncommented to test various scenarios. Those scenarios are

  1. Wait for user to press a key, then print the keyCode and return from main. In this scenario, onExit is never called.
  2. Wait for user to press a key, then print the keyCode and exit from main. In this scenario, onExit is called, but you get an Uncaught (in promise) error.
  3. Wait for user to press a key, then print the keyCode, call emscripten_sleep(5) then exit from main. In this scenario, onExit is called and there are no errors. The issue is that the emscripten_sleep(5) shouldn't be necessary.

I suspect this could be related to #14977. In that issue, the programmer sets up a Promise to resolve after a 50-second timeout, but it appears to resolve in less time. I say my situation could be related because the only case that generates the expected output is when I create extra delay with emscripten_sleep(5) after the Promise resolves. Its like the Promise hasn't really resolved or something. I know emscripten has nothing to do with Promise implementation. Just describing what it seems like.

sbc100 commented 3 years ago

In case (1) can you step through the code to see why onExit is never called?

Presumably procExit is called but keepRuntimeAlive is returning true for some reason?:

https://github.com/emscripten-core/emscripten/blob/4e95fbbebc4b8dfd04a0bfb09f7035aabd82aab2/src/postamble.js#L452-L464

sbc100 commented 3 years ago

If you want be sure the runtime exists, even if there is outstanding work (promises) you can also use emscripten_force_exit instead of exit, although I think it would be been to understand exactly what is keeping the runtime alive and why in your case.

jhirschberg70 commented 3 years ago

I tried stepping through the code for case 1, as you requested. I can tell you that the keepRuntimeAlive returns true because runtimeKeepaliveCounter is 1. What I don't understand is the overall flow of the code. callMain is executing a try-catch block which calls exit before I've even had a chance to press a key. In other words, it doesn't seem to be waiting for the Promise to resolve. It's like it's exiting before I even have a chance to do anything. When I do press a key, it prints out the keyCode to the console, but there's nothing left to do at that point. main has already returned.

Maybe I just don't understand how things are supposed to work. I'm new to asynchronous functions and Promises. My code flow is supposed to be call an asynchronous function that waits for you to press a key. When you press a key, send the keyCode to cout and then return from main. In other words, C++ is waiting for the asynchronous function to resolve before continuing on. This seems like it's following the EM_ASYNC_JS example from the docs with fetch exactly, and based on this statement in the docs "That shows that the C code only continued to execute after the async JS completed." it make me think that my understanding of how it's supposed to work is correct. It doesn't seem to be working that way though. While the output to cout doesn't happen until I press a key, the return from main doesn't wait for me to press a key.

I included a zip file above with all the code, but I'll paste it in here so people can see the actual code without having to grab the zip.

main.cpp

#include <iostream>
#include "emscripten.h"

EM_ASYNC_JS(int, getch, (), {
    return await IO.getch();
});

// No error or assertions but onExit not called
int main(int argc, char *argv[]) {
  int ch = getch();
  std::cout << "ch is " << ch << std::endl;

  return 0;
}

lib.js

var IO = (function() {
  "use strict";

  /* Constants */
  const ERR = -1;

  /* Private variables */
  var _keyBuffer = [];

  /* Public functions */
  function getch() {
    return new Promise((resolve, reject) => {
      let keyBufferCheckID = setInterval(() => {
        if (_keyBuffer.length) {
          clearInterval(keyBufferCheckID);
          resolve(_keyBuffer.shift());
        }
      });
    });
  }

  /* Private functions */
  function _initEventHandlers() {
    document.addEventListener("keydown", function(event) {
      event.preventDefault();
      _keyBuffer.push(event.keyCode);
    });
  }

  /* When page has loaded, initialize event handlers */
  window.addEventListener("load", () => {
    _initEventHandlers();
  });

  return {
    getch:getch
  }
})();

Module["onExit"] = () => {
  console.log("Called onExit");
}

em_async_js.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <script src="em_async_js.min.js"></script>
  </head>
  <body>
  </body>
</html>
jhirschberg70 commented 3 years ago

Should've included my build script for completeness. Here it is:

#!/bin/bash

rm em_async_js.min.js &> /dev/null

em++ \
    -gsource-map \
    -o em_async_js.min.js \
    -s ASSERTIONS=2 \
    -s ASYNCIFY \
    -s ASYNCIFY_IMPORTS="['exit']" \
    -s DEMANGLE_SUPPORT=1 \
    -s ENVIRONMENT=web \
    -s EXIT_RUNTIME=1 \
    -s NO_DISABLE_EXCEPTION_CATCHING \
    -s SAFE_HEAP=1 \
    -s SINGLE_FILE=1 \
    -s STACK_OVERFLOW_CHECK=2 \
    -std=c++17 \
    --pre-js lib.js \
    *.cpp
sbc100 commented 3 years ago

I think the issue is with EruntimeKeepalivePopM_ASYNC_JS and the way keepRuntimeAlive works.

Calling an async function causes your main function to unwind and then re-wind when its ready. ASYNCIFY keeps the runtime alive by calling runtimeKeepalivePush and runtimeKeepalivePop as it unwinds and rewinds. However, when it rewinds back into main I believe it does includes the post-main-exit code that was original in the code that it re-wound out of.

In short, I think that the use of ASYNCIFY currently breaks the "exit after main returns" logic.

This should be fixable but I'm not sure when we will get around to it.

sbc100 commented 3 years ago

Can you work around this by calling the exit() C function rather than relying on the return value from main (or avoiding ASYNCIFY?) for now?

jhirschberg70 commented 3 years ago

A couple comments. First, the code base I'm working with does in fact exit C++ by calling exit and never returns from main, so I don't have a problem with that. However, as I mentioned for case 2 in my test code, just calling exit results in an Uncaught (in promise) error. My code still functions okay. It does what it's supposed to do. I'm just anal about having errors pop up in the code, but I can live with it.

I'd love to avoid ASYNCIFY entirely, but that would require major refactoring of the code base. It currently blocks waiting for the user to press a key in, I believe, 27 different places. It's a game, and it was written without any consideration to having a non-blocking game loop.

My implementation of getch on the C++ side used to call emscripten_sleep(5) in a loop waiting for a keypress instead of using an asynchronous function. This worked without error. I just thought, why call emscripten_sleep when I shouldn't have to? If the main browser thread needs to keep running, I'll just do all my waiting on the javascript side. Why keep unwinding and rewinding with emscripten_sleep? I know the net effect is the same, it just seemed like a cleaner design using the asynchronous function. Obviously, emscripten_sleep works differently than just a generic asynchronous function, because, as I said, when I use it, things work without error.

At least you've confirmed that my understanding of how things are supposed to work is correct and that this is a bug in emscripten. Thanks!