Move stdout and stderr line buffering responsibility to shell.html

juj commented 7 years ago

Something I've been thinking of doing for quite a while - currently we expect that stdout and stderr streams are line buffered (e.g. like console.log() is) by requirement, and for example on fflush() we can only flush full lines. However different environments have support for different granularities. For example:

we can flush individual characters when printing to a text box on a web page, so that does not need to depend on line buffering
- when one is using window.dump() to debug, then we would be able to support char buffering
- when one is using emrun debug harness, then we can support char buffering
- when one is using a shell like node.js or SpiderMonkey, then we can also do char buffering

It would make sense to remove the abstractions for line buffering and rebuild them in the default shell.html file instead. That would allow a clean way to fix #2770.

The only drawback is for people who use a custom shell file, if behavior changed on them to suddenly buffer chars instead of lines, then they would see incorrectly split console.log() prints. Therefore the feature would perhaps need to be an option that one opts in to.

Any objections?

kripken commented 7 years ago

How about something like this: we add Module.printChar and Module.printCharErr, that receive single characters, and where we can, we implement them as printing out those single characters (like to an html textbox)? Then the few places that do want to print individual chars (like the syscall for printing) would call that. Everything else can continue to call print normally, and would still work.

Then if people use their own shell file, if they implement printChar we would call it, if not then we have a standard buffering impl that calls print when a full line is buffered, etc.

(Maybe with a better name than printChar.)

juj commented 7 years ago

That sounds workable (with the change that the function should not be restricted to receiving only single characters, but should work with arbitrary length strings, with \n in them)

How about Module.writeStdout and Module.writeStderr?

kripken commented 7 years ago

So print/printErr would be full lines, while write/writeErr receives characters not ending in a newline, and can print them when possible? I wish we had a better name here than print vs write, but maybe write is "low-level" and it's ok.

rajsite commented 6 years ago

I'm also interested in having a lower-level function to plug into. We print some binary data to stdout and with NO_FILESYSTEM=1 and overriding the current Module.print and Module.printErr I can't distinguish between null and newline: https://github.com/kripken/emscripten/blob/52ff847187ee30fba48d611e64b5d10e2498fe0f/src/library_syscall.js#L162

I also don't need the UTF8ArrayToString behavior so having Module.write / Module.writeErr or the ability overwrite SYSCALLS.printChar seems useful.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

vadimkantorov commented 3 years ago

@kripken @sbc100 Could you please reopen this issue? I stumbled on this when porting clear program which does not print end line: https://github.com/emscripten-core/emscripten/issues/2770#issuecomment-749108170 How do I force stream flush from JavaScript? I'm calling _main function manually, so libc cleanup does not apply in my case. Having more flexibility on buffering / stream flushing is very useful.

sbc100 commented 3 years ago

I think you can just call fflush(). If that isn't enough you can also call fsync(). Bypassing the buffering that musl is don't is probably not something we want to do.

vadimkantorov commented 3 years ago

I'll call fflush, but I think it'd be good to have it bound by default by Emscripten

sbc100 commented 3 years ago

What to you mean by "bound by default"?

vadimkantorov commented 3 years ago

I mean flushing the buffer externally is a useful functionality when porting some existing software.

I meant having a JavaScript method that calls libc's fflush internally. And having some API docs around that. Alternatively, just a recipe in the docs would be already good.

How do I export fflush? as _fflush similar to _main?

sbc100 commented 3 years ago

Yes _fflush.

Sure I think we can find a way to improve in this area. Ideally I think we would use musl's exit function which calls __stdio_exit.

vadimkantorov commented 3 years ago

Oh, maybe I should bind _exit then :) Or is it already exported by default?

sbc100 commented 3 years ago

We don't currently compile musl's exit so that won't work.

vadimkantorov commented 3 years ago

A question is, how to export a static symbol stdout/stderr in order to to pass it to fflush

sbc100 commented 3 years ago

You can't easily do that, but you probably just want to pass NULL/0 to flush all streams.

vadimkantorov commented 3 years ago

pass NULL/0 to flush all streams.

Great idea!

I wonder, would fopen("/dev/stdout", "r") return the stdout FILE*?

sbc100 commented 3 years ago

No, I think each fopen call returns a different pointer.

vadimkantorov commented 3 years ago

fflush does not help :( Calling fflush after ExitStatus was thrown does not lead to print/printErr being called. Can there be some extra Emscripten-specific buffering? So that libc's streams are flushed, but print is not called?

main is just clear printing clearing sequence: https://git.busybox.net/busybox/tree/console-tools/clear.c

 const NOCLEANUP_callMain = (Module, args) =>
        {
            const main = Module['_main'], fflush = Module['_fflush'];
            const argc = args.length+1;
            const argv = Module.stackAlloc((argc + 1) * 4);
            const NULL = 0;

            args = [Module.thisProgram].concat(args);
            const lens = args.map(a => Module.lengthBytesUTF8(a));
            Module.HEAP32[argv >> 2] = Module.allocateUTF8OnStack(args.join('\0'));
            for(let i = 1; i < argc; i++)
                Module.HEAP32[(argv >> 2) + i] = Module.HEAP32[(argv >> 2) + i - 1] + lens[i - 1] + 1;
            Module.HEAP32[(argv >> 2) + argc] = NULL;

            try
            {
                main(argc, argv);
            }
            catch(e)
            {
                this.print('callMain: ' + e.message);
                fflush(NULL);
                return e.status;
            }

            return 0;
        }

vadimkantorov commented 3 years ago

Should I also somehow "flush" the TTY? akin to https://github.com/emscripten-core/emscripten/blob/0f298808fcad3a9ef25966154b83e1cb3a520901/src/postamble.js#L344

vadimkantorov commented 3 years ago

Somehow, busybox's base64 also has the flushing problem, even if fputs+fflush is used (I'm using -w 0 case): https://git.busybox.net/busybox/tree/coreutils/uudecode.c#n322

vadimkantorov commented 3 years ago

I think something is getting buffered in Emscripten's JavaScript side, since the output appears at run of next "echo"

vadimkantorov commented 3 years ago

To work around, I had to call putchar('\n') before fflush

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

rajsite commented 2 years ago

This was a bit of a pain point for me as well. I'm using -s FILESYSTEM=0 with a library and in order to flush the stdout / stderr I exposed a --post_js with the following:

Module.doMyFlush = flush_NO_FILESYSTEM;

Then after each entry point I flush:

Module._doTheThing();
Module.doMyFlush();

It's unfortunate that the library being used relies on stderr for error reporting but it happens. Could be nice to have a flush method exposed on Module that works regardless of FILESYSTEM configuration.

sbc100 commented 2 years ago

@rajsite does your program write strings to stdout/stderr that do not end in newline?

IIUC as long as the strings ends in a newline stdout and stderr should not be buffered.

Also, flush_NO_FILESYSTEM might be dangerous to call in this way for a couple of reasons:

It calls __stdio_exit which shuts down the libc stdio system. Using libc functions after calling this function is not expected to work.
It injects extra \n characters in order to trigger the fulling of the buffers, so you would see those extra/unexpected newlines in your output.

rajsite commented 2 years ago

@rajsite does your program write strings to stdout/stderr that do not end in newline?

yep it can (a script runtime with user defined functions that can call print with or without newlines)

It calls __stdio_exit which shuts down the libc stdio system. Using libc functions after calling this function is not expected to work.

😨

It injects extra \n characters in order to trigger the fulling of the buffers, so you would see those extra/unexpected newlines in your output.

Haven't checked for that specifically but looking at the implementation that sounds right, which sounds wrong for my use-case. Need to add a test for that. (I think I haven't noticed yet because I'm currently still just pushing those to console.log but plan to capture and do some parsing on the output which may have issues then).

Know of a better workaround? And yea maybe a bolder +1 to having something on Module to synchronously flush the buffers without mutating the content.

vadimkantorov commented 2 years ago

IIUC as long as the strings ends in a newline stdout and stderr should not be buffered.

for me, it was not the case for clear program and quite some many UNIX programs from busybox

sbc100 commented 2 years ago

IIUC as long as the strings ends in a newline stdout and stderr should not be buffered.

for me, it was not the case for clear program and quite some many UNIX programs from busybox

What wasn't that case? Are you saying the programs didn't write newlines or that they did write newlines but the output was buffered in emscripten regardless?

vadimkantorov commented 2 years ago

These programs don't write newlines, and then it's difficult to force emscripten to flush from JavaScript - basically I had to first print newline character myself and only then fflush helps.

sbc100 commented 2 years ago

Yes that makes sense. The buffering only happens when programs don't write newlines.

vadimkantorov commented 2 years ago

My suggestion is to include this force-flush functionality in Emscripten as a reusable function... If the program output is consumed, it's easier to not have to handle if the last new-line is program-produced or just an artifact to force flush

sbc100 commented 2 years ago

One problem here is that console.log cannot output part of line.. so we have no nice way to flush output that doesn't end in a newline, at least not as long as console.log is where things are getting flushed too.

But still, I agree we could add such a function for this type of case. @vadimkantorov is your issue the same as that of @rajsite in that you are hitting the SYSCALLS_REQUIRE_FILESYSTEM == 0 path where output is performed via the printChar helper?

vadimkantorov commented 2 years ago

nope, I always had filesystem enabled and i didn't hit this path

my problem was just the buffering and that fflush itself was not sufficient

vadimkantorov commented 2 years ago

also I'd rather callMain did this flush by itself regardless of call mode - but user-callable flush is also needed (because callMain currently has other problems that we had discussed elsewhere)

emscripten-core / emscripten

Move stdout and stderr line buffering responsibility to shell.html #5290