Standardise Asynchronous Logging and Message Encoding on Windows

CMCDragonkai commented 2 years ago

Specification

Using process.stderr and console.error behaves differently when on Windows compared Linux and MacOS.

Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX

By applying this fix to our stream handlers:

// Default behaviour on Node.js:
// Files: synchronous on Windows and POSIX
// TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
// Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
// In order to align Windows with POSIX behaviour:
if (process.stdout.isTTY) {
  process.stdout._handle.setBlocking(true);
} else if (os.platform() === 'win32' && !process.stdout.isTTY) {
  process.stdout._handle.setBlocking(false);
}

It ends up making Windows behave the same as Linux and MacOS.

The same needs to be applied to stderr as well.

Note that we leave ConsoleErrHandler the same as usual because it's not necessarily going to nodejs.

This means we should probably create StandardOutHandler and StandardErrHandler, which is a breaking change for any usage of StreamHandler. This requires a rename in downstream projects... primarily tests in DB, EFS and PK.

Additionally Powershell currently outputs in UTF-16LE. This is a bit weird when piping the output of the stderr/stdout to a file. We can write in code:

process.stdout.setDefaultEncoding('utf-8');
process.stderr.setDefaultEncoding('utf-8');

Just to be really sure (but it is a global change note). One idea is to only set it when we outputting a log, to avoid global changes, we can tack on a utf-8 setting. Or we can respect whatever has been set in the application.

That being said, it won't work for powershell, only CMD. So powershell requires an update to the latest powershell core for it work properly.

As for CRLF vs LF, always use LF in our logs. Both powershell and cmd understand and render this properly.

Additional context

Tasks

...
...
...

CMCDragonkai commented 2 years ago

See https://github.com/facebook/jest/issues/9324#issuecomment-1199457498 for the buggy jest that occurs when asynchronous logging is occurring.

This must be because process.stdout and process.stderr is asynchronous on windows.

In doing so, logs must be output after the tests finishes and we get a exit code of 1.

Fixing that problem is more than fixing up the js-logger, since any usage of process.stdout or process.stderr could be problematic. These may still be used by programs outside of js-logger.

It may be a good idea to provide a different library to just fix this globally, not just for the js-logger.

CMCDragonkai commented 2 years ago

Note that process.stdout.write and process.stderr.write does have a callback option. I wonder if we can solve this more specifically by just adding a callback to these calls and returning a promise instead.

CMCDragonkai commented 2 years ago

Ok I've found out some more interesting information:

This means process.exit is dangerous as it doesn't flush stdout and stderr buffers.

I can see that in PK this is currently being used:

»» ~/Projects/Polykey
 ♖ ag 'process.exit\('                               (feature-concurrent-state) pts/6 15:02:26
src/bin/utils/ExitHandlers.ts
43:    process.exit();

tests/grpc/utils/testService.ts
215:        if (message === 'exit') process.exit(0);
235:        if (message === 'exit') process.exit(0);

These should be refactored out.

Interesting the usage of that package.

I'm a bit confused right now what is synchronous and what isn't synchronous. I searched over the nodejs codebase and I can see that it already calls this._handle.setBlocking(true); when it is a TTY. I suppose this should be true even on Windows, but maybe it isn't.

  // Prevents interleaved or dropped stdout/stderr output for terminals.
  // As noted in the following reference, local TTYs tend to be quite fast and
  // this behavior has become expected due historical functionality on OS X,
  // even though it was originally intended to change in v1.0.2 (Libuv 1.2.1).
  // Ref: https://github.com/nodejs/node/pull/1771#issuecomment-119351671
  this._handle.setBlocking(true);

This is referenced from lib/internal/bootstrap/switches/is_main_thread.js.

But the comments in the issue seem to indicate to me that piping is asynchronous, but this is not the case for windows because windows doesn't support asynchronous piping. In that case, it's possible to result in a memory exhaustion when logging to much to a pipe.

I wonder then...

Is it not possible to have asynchronous piping on windows simply by maintaining an internal buffer?
Should we be using .write callbacks to know when the buffer is flushed?
What happens to the windows tests when making the TTY output blocking, does it end up passing?
Maybe jest shouldn't be throwing an error just because logs are outputted at the end.
Should we be buffering or allow the user to control the buffering of the logs to increase the logging throughput in our benchmarks? (Yes when piping to files, no when outputting to TTY).

CMCDragonkai commented 2 years ago

I did a few tests on windows and linux... note that process.stdout.isTTY and process.stderr.isTTY is either true if it is a TTY or undefined if it is not. So it's not just true/false.

It is in fact accurate when you run a script:

console.log(process.stdout.isTTY);
console.log(process.stderr.isTTY);

And on windows you do node .\test.js 2>&1 > testfile, you will get:

undefined
undefined

And if you don't redirect stderr, then it stays as true.

CMCDragonkai commented 2 years ago

Further testing...

process.stdout.write(
  'First\n',
  (e) => {
    process.stdout.write('Last\n');
  }
);

process.stdout.write('Second\n');

The above always prints:

First
Second
Last

Regardless of whether the stream is blocking or not.

This is important to note, because that means (which is obvious in hindsight) that setting blocking or non-blocking on the stream handler is independent of the callback execution order.

One might think that if it was synchronous..., then the callback should execute before the last Second. Or most likely the callback function parameter wouldn't exist at all.

But this does mean that the function call itself might be asynchronous even if the stream itself is blocking.

CMCDragonkai commented 2 years ago

The most recent changes (https://gitlab.com/MatrixAI/open-source/typescript-demo-lib/-/jobs/2795324826) to making the stream blocking while putting it into tests/setup.ts seem to have helped make it work. The glaring issue is that the logs are out of order when running on Windows. You can see on the CI/CD logs, that it doesn't print them in the correct order. Like here: https://gitlab.com/MatrixAI/open-source/typescript-demo-lib/-/jobs/2793772454, the the coverage report is printed first, then the test results, and no GLOBAL TEARDOWN is printed.

This is all quite confusing. When using process.stdout.write, we can definitely see that the order of execution here matters.

CMCDragonkai commented 2 years ago

Also it's important to note that process.stdout._handle (and the same for stderr) does not exist if the program is being piped. It appears that this only exists when it is in fact a TTY.

So I guess you cannot control it if you are piping it.

CMCDragonkai commented 2 years ago

Ok so it appears that we can gain some determinism by always writing with promisified version of process.stderr.write.

const stdoutWrite = async (message) => {
  return new Promise((resolve, reject) => {
    const flushed = process.stdout.write(message, (e) => {
      if (e != null) return reject(e);
      return resolve(flushed);
    });
  });
};

However this is a different problem altogether about whether it is buffered/blocking or not.

A promisified version of stdout.write and stderr.write simply means you can choose to await for the results. This then synchronises the writes along with the rest of the operations.

However this changes the API of the logger, you now do await logger.info() instead of just logger.info. Although you can now choose to ignore the promise, and just call it as normal.

I'm not really sure if this fixes the problem though... I can only imagine out of order messages due to interleaved writes, and this can still happen with await.

CMCDragonkai commented 2 years ago

Ok some more information.

The _handle.setBlocking, is not a JS function, this ends up calling LibuvStreamWrap::SetBlocking in src/stream_wrap.cc.

This ends up calling:

  args.GetReturnValue().Set(uv_stream_set_blocking(wrap->stream(), enable));

This function is documented in http://docs.libuv.org/en/v1.x/stream.html#c.uv_stream_set_blocking.

Note:

Enable or disable blocking mode for a stream.

When blocking mode is enabled all writes complete synchronously. The interface remains unchanged otherwise, e.g. completion or failure of the operation will still be reported through a callback which is made asynchronously.

Warning Relying too much on this API is not recommended. It is likely to change significantly in the future. Currently only works on Windows for uv_pipe_t handles. On UNIX platforms, all uv_stream_t handles are supported.

Also libuv currently makes no ordering guarantee when the blocking mode is changed after write requests have already been submitted. Therefore it is recommended to set the blocking mode immediately after opening or creating the stream.

So at the libuv layer, this implies it is non-blocking by default.

This explains why the node code ends up setting it to be blocking mode in lib/internal/bootstrap/switches/is_main_thread.js.

It does say that on Windows, it is only supported for uv_pipe_t.

And so this comment:

A bit of background: some years ago, I think it was in v0.7, it was decided to make stdout and stderr blocking. Turns out it doesn't work so well for pipes; ttys and files are usually very fast (local ones anyway) but pipes tend to fill up rapidly.

A number of people complained about it so we made stdio-to-pipe non-blocking again (except on Windows, where it's not supported.) I forgot the exact bug reports but the theme was that stdio was too slow; on OS X, the kernel pipe buffer is only about 4 kB, so it's easy to max out. https://github.com/nodejs/node/pull/1771#issuecomment-119351671

Explains why.

Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX

Let's settle this a bit

By default libuv is non-blocking for stdout and stderr - this means they send data to the in-kernel buffer (except for pipes...?)
The return status was always reported via callback. Then libuv made it possible for you to set it to be blocking mode.
NodeJS now since https://github.com/nodejs/node/commit/1b63fa10960d99078de9a3618ee76dcd5013f6d0 has made it blocking on all platforms.
However libuv on windows does not support blocking, so the above operation doesn't do anything.
This means setBlocking(true) is already the case for all platforms when it is a TTY. It's just that on Windows, this has no effect.

So this explains:

For TTYs: Linux: Blocking, MacOS: Blocking, Windows: Blocking

Now how to make windows blocking...? You can't, libuv doesn't support it. So the fact that nodejs makes TTYs blocking for Linux and MacOS is only for convenience.

So in order to standardise behaviour here, one would need to instead to promisify the callback. There are technically 2 ways to do this as documented in https://nodejs.org/api/stream.html#writablewritechunk-encoding-callback

While a stream is not draining, calls to write() will buffer chunk, and return false. Once all currently buffered chunks are drained (accepted for delivery by the operating system), the 'drain' event will be emitted. Once write() returns false, do not write more chunks until the 'drain' event is emitted. While calling write() on a stream that is not draining is allowed, Node.js will buffer all written chunks until maximum memory usage occurs, at which point it will abort unconditionally. Even before it aborts, high memory usage will cause poor garbage collector performance and high RSS (which is not typically released back to the system, even after the memory is no longer required). Since TCP sockets may never drain if the remote peer does not read the data, writing a socket that is not draining may lead to a remotely exploitable vulnerability.

Relying on the drain event allows one to make use of Node's own internal in-memory buffer for the stream.

If however instead we used the callback, that would ignore the internal in-memory buffer, and instead wait for the in-memory buffer to flush. I guess that would be less efficient since you're basically checking for the drain event even if your internal in-memory buffer isn't exhausted.

Now when flushed, it's still going to go to OS in-kernel buffer if the socket is set to be non-blocking (which is the case for Windows).

CMCDragonkai commented 2 years ago

This is very confusing because the settings have changed over and over.

Ok I can see that in net.js, it does set it to be blocking if it is a pipe and it is on Windows. And since libuv does support blocking mode for pipes, it does set it to be blocking.

This was done 9 years ago in https://github.com/nodejs/node/commit/20176a98416353d4596900793f739d5ebf4f0ee1, and it seems kind of like an oversight. Now it's doing something different from the rest of the systems, but has been kept for backwards compatibility.

CMCDragonkai commented 2 years ago

This would meany any kind of piping on windows is going to be quite slow for NodeJS apps when writing to process.stdio or `process.stderr.

CMCDragonkai commented 2 years ago

Ok based on all this information, it seems what we need to do is promisify logger.info... etc. So they return promises.

The promise will resolve when the data is flushed to in-kernel buffer (if any).

The logger library won't control whether the streams are blocking or non-blocking. That will be left to the application.

But the application can maintain some consistency by always awaiting the logging.

This would mean application-wise we would need to change to using await logger.info.

Users can still not do that, and just let logger.info do what it does by default. Which in this case is to build up an in-memory buffer, then potentially submit to an in-kernel buffer.

Error based logging is best done with an await logger.info, because it just ensures that STDERR logs are immediate.

This should have minimal performance differences on linux, macos for TTYs because they are already blocking, and on Windows, it aligns the behaviour.

However if can have performance drops if the output is to a pipe/file.

CMCDragonkai commented 2 years ago

As for fixing the bug in jest, this seems like a case where Jest shouldn't be exiting with non-0 code just because of additional logging.

There's also no point in setting blocking... especially because only pipes support blocking, and they are already blocking by default.

CMCDragonkai commented 2 years ago

Ok at the very least, we have to return a promise that can be optionally awaited for.

As for console.log, there won't be anything, so an empty promise is fine too.

CMCDragonkai commented 2 years ago

It turns out that when running jest tests, the stdio are not TTY streams, but they are pipe streams instead.

See the jobs:

In all 3 cases, checking isTTY shows up as undefined.

The _handler in all 3 cases are pipes.

I also tested this locally, and they are still pipes. In that case any deviance in behaviour is not due to TTY differences in blocking/non-blocking, but that Windows is actually blocking on pipes, while Linux and MacOS have non-blocking for their pipes.

Therefore again setting setBlocking should have no real effect.

Ok so alot of the above about TTYs don't apply to this problem, because the stdio streams are just pipes, and they are actually opposite. They are non-blocking in linux/macos, while being blocking on windows.

MatrixAI / js-logger