Open jhirschberg70 opened 3 years ago
I don't believe there are any plans to remove the onExit
callback.
A couple of notes though:
Firstly you might want to consider instead using atexit
handlers (since they run only on normal shutdown rather that, for example, on abort/assert when the program can be in an undefined state). To register and atexit
handler from JS you can call Module.addOnExit()
at runtime or you can addAtExit
from at build time in your library JS file. e.g: https://github.com/emscripten-core/emscripten/blob/9a644e1a878863b9dc168401174c9f5c775e8c1b/src/library_fs.js#L35
Which leads me the next question, is this filesystem syncing on exit something you need to do for you custom filesystem or are you just using emscripten filesystem, in which case it would be bug if they did not sync on exit.
Thanks for responding so quickly!
For a long time I wasn't using EXIT_RUNTIME=1, so atexit
wouldn't work (if I understand correctly). I'm okay with using EXIT_RUNTIME=1, so I can try using atexit
. Thank you.
I'm using the emscripten filesystem with IDBFS. The C++ is saving a couple of files prior to exit. If I understand correctly, the C++ is writing to memory when saving these files, and I need to do FS.syncfs
to get the data copied from memory into IDB. I'm just looking for a way, if possible, to do this reliably after the C++ has exited, so I don't have to modify the C++ to do the sync.
All the filesystems (including IDBFS) should sync on exit:
Actually it looks like syncfs
is not called as part of quit
... I wonder why. Maybe @juj or @kripken might know?
I tried using addAtExit
and Module["addOnExit"]
and neither worked for me. addAtExit
was undefined when I ran my code, so I likely did something wrong there. The function I defined for Module["addOnExit"]
never got called. So far, the only thing that seems to be working is the method I outlined originally. Are there particular compiler flags I need to use/not use in order to make addAtExit
or Module["addOnExit"]
work?
When are you trying to call addOnExit
? In general you can't call such methods during --pre-js
. Try doing it a --post-js
maybe?
In --post-js
I think you can just call addOnExit
(no need to export it on the Module object or call it there).
I'm a little confused by addAtExit
vs. addOnExit
. You included a link to addAtExit
being used in library_fs.js. You mentioned calling addOnExit
in your most recent comment. Are they both supposed to be available and equivalent?
As for using --pre-js
vs. --post-js
, I believe I tried addAtExit
and Module["addOnExit"]
with both --pre-js
and --post-js
, and nothing worked. I don't have access to my code right now, so I can't confirm that tried all the permutations, but I'm pretty confident I did. I've definitely tried Module["onExit"]
with both --pre-js
and --post-js
, and both ways worked.
I agree its a little confusing. I don't know why the names differ.
addAtExit
only exists during JS library processing, it cannot be used at runtime which means it cannot be used in --pre-js
or --post-js
.
addOnExit
is runtime function defined in src/preamble.js
. This means you can call it directly from --post-js
but I won't exist in --pre-js
since pre-js is included before src/preamble.js
. You don't need to Module["addOnExit"]
and indeed addOnExit
won't be defined on the module unless you do also add it to EXPORTED_FUNCTIONS
.
At least that is my understanding of of how things work right now :)
You can obviously keep using Module['onExit']
too.. but just note that unlike addOnExit it gets called not just on normal program exit but also if the program crashes/aborts/asserts.
Thanks for the clarification on addAtExit
and addOnExit
.
Regardless of whether I continue using the code I have or switch to using addOnExit
(I can see arguments for both) my overall question remains: since FS.syncfs
is asynchronous and I have EXIT_RUNTIME=1
in my build, is it guaranteed that the sync will complete before data in memory is destroyed and the filesystem quits? You pointed out yourself that library_fs.js is calling addAtExit('FS.quit();');
. I'm just worried that there's a race between my code and the emscripten code, and I've just been getting lucky so far.
Thanks for all your help!
I don't really know what FS.quit()
is doing, but its hard to imagine that the data in memory will be lost. As long as someone has a reference to the data it will not get GC'd by the JS engine and presumably the syncfs process holds a reference to the data it is sync'ing so it probably fine, but you might want to get a deeper understanding of the code to confirm.
What you say about a reference to the data and garbage collection makes sense, but since this involves code I didn't write and don't fully understand, it makes me wary. It sounds like digging deeper into the code may be in my future! Thanks for all the help!
Actually it looks like
syncfs
is not called as part ofquit
... I wonder why. Maybe @juj or @kripken might know?
IDBFS syncs after each file write operation, because otherwise there would be common data loss if syncing (only) at quit time. Then if one is syncing after each write, that makes syncing at quit redundant.
I'm very confused. Here's a simplified pseudo version of the flow I've been using in my code:
FS.mkdir(root);
FS.mount(IDBFS, {}, root);
FS.syncfs(true, (err) => { assert(!err); });
Wait for FS.syncfs to complete
callMain()
onExit FS.syncfs(false, (err) => { assert(!err); });
Are you saying I shouldn't have to make the second FS.syncfs
call, that it should be happening automatically as my C/C++ writes files?
err sorry - actually I miswrote. In the recent application that I have been working on we have been syncing after each write, so I misthought that it would be built-in behavior, but naturally it is not.
Recalling this with more attention: in general the filesystem runtime has not made decisions (on behalf of the user) on whether the data in an IDBFS mount is immutable over page reloads (data stored in IndexedDB is read-only, never persists back) or read-write (writes persist back). So if one only read-mounts an IDBFS mount, they can regard it as an immutable read-only mount, or they can choose to write changes back to the storage.
Automatically persisting on exit (or after writes for that matter) would prevent the immutable/read-only use case where an app will only do an initial population to IDBFS, but then want to reload the original set of data on each page reload. That is why syncing back is an explicit operation that the user can choose when to initiate.
Thanks for clarifying that. It sounds like my basic understanding of how things work is correct. In my particular case, the data is never immutable, so I will always want to sync when the C/C++ exits. I just want to make sure that if I'm using Module["onExit"]
or addOnExit()
to initiate that final sync that the data in memory that I'm copying back to IDB won't be changed/deleted before the sync completes. This seems unlikely with how javascript garbage collection works, but there's plenty about javascript I still don't know, and I don't know everything emscripten does when the runtime exits. I just want to know that using Module["onExit"]
or addOnExit()
to initiate that final sync is going to be reliable.
If you want persistence then perhaps it would be better to sync right away rather than only onExit
? I wonder if we could/should add a mount option what would sync every time a file is closed? Seems like it would be nice if users could get a persistent IDBFS mount without having to write JS code.
What you suggest about syncing when files are closed would certainly work, and having the option to do that automatically could be nice (leaving out automatic syncing on files that are opened read-only). In my particular case, I don't want to do this myself, as I don't want to alter the C++ source if I can avoid it, so I'm fine with initiating the sync in javascript. I've put my code below for anybody looking for an example of how to initialize the filesystem and then do the final sync. I've included only the relevant portions of IO
. I use the --pre-js
and EXIT_RUNTIME=1
options at build. _fsSynced
is not strictly necessary. It's a holdover from how I used to do things before switching to using Promises
. I don't need to use a Promise
with IO.closeFiles()
, either. I did it to provide a consistent interface between IO.initFiles()
(where the Promise
makes for a cleaner implementation than my old implementation of looping to check if _fsSynced
is true) and IO.closeFiles()
.
var IO = (function() {
var _fsSynced = false;
...
function closeFiles() {
_fsSynced = false;
return new Promise((resolve, reject) => {
FS.syncfs(false, (err) => {
if (err) {
reject(err);
}
else {
_fsSynced = true;
resolve(_fsSynced);
}
});
});
}
function initFiles(path) {
_fsSynced = false;
FS.mkdir(path);
FS.mount(IDBFS, {}, path);
return new Promise((resolve, reject) => {
FS.syncfs(true, (err) => {
if (err) {
reject(err);
}
else {
_fsSynced = true;
resolve(_fsSynced);
}
});
});
}
return {
closeFiles:closeFiles,
initFiles:initFiles
}
})();
Module["onExit"] = async () => {
await IO.closeFiles()
.catch(err => { console.log("Error closing files"); });
}
Module["onRuntimeInitialized"] = async () => {
let root = "/root";
await IO.initFiles(root)
.catch(err => { console.log("Error initializing files"); });
callMain(args);
}
Is there an equivalent of Module["onExit"]
for when main
returns and exit
isn't called?
onExit
should be called in both cases yes
If you are not seeing onExit
being called on return from main that would be bug.
Let me double-check that onExit
isn't being called on returning from main
in my code. I could've done something boneheaded. I won't be able to confirm until tonight. Thanks!
I think you need to set EXIT_RUNTIME in order to get the onExit
to fire.. but that should be true if call exit() directly or if you return from main.
Please make sure you test with the latest version of emscripten since the exit behavior is something I've bee iterating on recently.
Yes, you need to set EXIT_RUNTIME
for onExit
to fire. At least that's been my experience, and I should be doing that. I'll double-check that, as well, and I'll make sure to update to the latest emscripten when I test again. I updated to 2.0.27 to recently, so I don't know if there's anything newer than that.
Okay, I've had some time to play with this. I updated emsdk to the latest release (not tip of tree). I wrote a very simple test with main
just immediately returning 0, and onExit
fired, as expected. My actual code, however, won't fire onExit
unless I explicitly call exit
. I whittled this down to the barebones case that would fail, and I believe the problem comes from EM_ASYNC_JS
. Here's all my code:
Run build-debug.sh to build everything and then load up em_async.html in a browser and look at the console. The code defines a keydown listener to push keyCodes into a buffer whenever the user presses a key. An asynchronous function defined with an EM_ASYNC_JS
macro can be used to retrieve the keyCodes of the keys pressed. I've defined three main
functions in main.cpp that can be commented/uncommented to test various scenarios. Those scenarios are
return
from main
. In this scenario, onExit
is never called.exit
from main
. In this scenario, onExit
is called, but you get an Uncaught (in promise) error.emscripten_sleep(5)
then exit
from main
. In this scenario, onExit
is called and there are no errors. The issue is that the emscripten_sleep(5)
shouldn't be necessary.I suspect this could be related to #14977. In that issue, the programmer sets up a Promise
to resolve after a 50-second timeout, but it appears to resolve in less time. I say my situation could be related because the only case that generates the expected output is when I create extra delay with emscripten_sleep(5)
after the Promise
resolves. Its like the Promise
hasn't really resolved or something. I know emscripten has nothing to do with Promise
implementation. Just describing what it seems like.
In case (1) can you step through the code to see why onExit
is never called?
Presumably procExit
is called but keepRuntimeAlive
is returning true for some reason?:
If you want be sure the runtime exists, even if there is outstanding work (promises) you can also use emscripten_force_exit
instead of exit
, although I think it would be been to understand exactly what is keeping the runtime alive and why in your case.
I tried stepping through the code for case 1, as you requested. I can tell you that the keepRuntimeAlive
returns true because runtimeKeepaliveCounter
is 1. What I don't understand is the overall flow of the code. callMain
is executing a try-catch
block which calls exit
before I've even had a chance to press a key. In other words, it doesn't seem to be waiting for the Promise
to resolve. It's like it's exiting before I even have a chance to do anything. When I do press a key, it prints out the keyCode to the console, but there's nothing left to do at that point. main
has already returned.
Maybe I just don't understand how things are supposed to work. I'm new to asynchronous functions and Promises
. My code flow is supposed to be call an asynchronous function that waits for you to press a key. When you press a key, send the keyCode to cout and then return
from main
. In other words, C++ is waiting for the asynchronous function to resolve before continuing on. This seems like it's following the EM_ASYNC_JS
example from the docs with fetch
exactly, and based on this statement in the docs "That shows that the C code only continued to execute after the async JS completed." it make me think that my understanding of how it's supposed to work is correct. It doesn't seem to be working that way though. While the output to cout
doesn't happen until I press a key, the return
from main
doesn't wait for me to press a key.
I included a zip file above with all the code, but I'll paste it in here so people can see the actual code without having to grab the zip.
main.cpp
#include <iostream>
#include "emscripten.h"
EM_ASYNC_JS(int, getch, (), {
return await IO.getch();
});
// No error or assertions but onExit not called
int main(int argc, char *argv[]) {
int ch = getch();
std::cout << "ch is " << ch << std::endl;
return 0;
}
lib.js
var IO = (function() {
"use strict";
/* Constants */
const ERR = -1;
/* Private variables */
var _keyBuffer = [];
/* Public functions */
function getch() {
return new Promise((resolve, reject) => {
let keyBufferCheckID = setInterval(() => {
if (_keyBuffer.length) {
clearInterval(keyBufferCheckID);
resolve(_keyBuffer.shift());
}
});
});
}
/* Private functions */
function _initEventHandlers() {
document.addEventListener("keydown", function(event) {
event.preventDefault();
_keyBuffer.push(event.keyCode);
});
}
/* When page has loaded, initialize event handlers */
window.addEventListener("load", () => {
_initEventHandlers();
});
return {
getch:getch
}
})();
Module["onExit"] = () => {
console.log("Called onExit");
}
em_async_js.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<script src="em_async_js.min.js"></script>
</head>
<body>
</body>
</html>
Should've included my build script for completeness. Here it is:
#!/bin/bash
rm em_async_js.min.js &> /dev/null
em++ \
-gsource-map \
-o em_async_js.min.js \
-s ASSERTIONS=2 \
-s ASYNCIFY \
-s ASYNCIFY_IMPORTS="['exit']" \
-s DEMANGLE_SUPPORT=1 \
-s ENVIRONMENT=web \
-s EXIT_RUNTIME=1 \
-s NO_DISABLE_EXCEPTION_CATCHING \
-s SAFE_HEAP=1 \
-s SINGLE_FILE=1 \
-s STACK_OVERFLOW_CHECK=2 \
-std=c++17 \
--pre-js lib.js \
*.cpp
I think the issue is with EruntimeKeepalivePopM_ASYNC_JS
and the way keepRuntimeAlive works.
Calling an async function causes your main function to unwind and then re-wind when its ready. ASYNCIFY keeps the runtime alive by calling runtimeKeepalivePush and runtimeKeepalivePop as it unwinds and rewinds. However, when it rewinds back into main I believe it does includes the post-main-exit code that was original in the code that it re-wound out of.
In short, I think that the use of ASYNCIFY
currently breaks the "exit after main returns" logic.
This should be fixable but I'm not sure when we will get around to it.
Can you work around this by calling the exit()
C function rather than relying on the return value from main (or avoiding ASYNCIFY?) for now?
A couple comments. First, the code base I'm working with does in fact exit C++ by calling exit
and never returns
from main
, so I don't have a problem with that. However, as I mentioned for case 2 in my test code, just calling exit
results in an Uncaught (in promise) error. My code still functions okay. It does what it's supposed to do. I'm just anal about having errors pop up in the code, but I can live with it.
I'd love to avoid ASYNCIFY entirely, but that would require major refactoring of the code base. It currently blocks waiting for the user to press a key in, I believe, 27 different places. It's a game, and it was written without any consideration to having a non-blocking game loop.
My implementation of getch
on the C++ side used to call emscripten_sleep(5)
in a loop waiting for a keypress instead of using an asynchronous function. This worked without error. I just thought, why call emscripten_sleep
when I shouldn't have to? If the main browser thread needs to keep running, I'll just do all my waiting on the javascript side. Why keep unwinding and rewinding with emscripten_sleep
? I know the net effect is the same, it just seemed like a cleaner design using the asynchronous function. Obviously, emscripten_sleep
works differently than just a generic asynchronous function, because, as I said, when I use it, things work without error.
At least you've confirmed that my understanding of how things are supposed to work is correct and that this is a bug in emscripten. Thanks!
I see that there's an open issue regarding Module.onExit not being documented #6451. Is this simply an oversight or is there the possibility of Module.onExit going away?
I'm operating under the assumption that Module.onExit isn't going away, so my second question is about usage. I am building with -s ASYNCIFY and EXIT_RUNTIME=1 and using IDBFS for persistent storage. I have implemented the following:
The IO.closeFiles() referenced above has the following implementation:
What I'm trying to accomplish is a final sync of the filesystem when my C++ code exits. My initial solution was to just initiate FS.syncfs from within the C++ code with EM_ASM and wait for the sync to complete using a loop and emscripten_sleep before exiting. That worked fine, but I'm using an open source code base, and I'm trying to avoid modifying files. I figured if there was a reliable way to sync from javascript after the C++ had exited, then I could avoid having to change the C++ source. The above code seems to be working, but I want to make sure it's reliable. I'm worried about a potential race condition and the possibility of data being destroyed before the FS.syncfs completes. Can I rely on this implementation or is there a better way to accomplish what I want?