Open emmacasolin opened 2 years ago
Full output from failing tests:
Fixing the failures mentioned in this issue needs to be done before working on the integration tests for these platforms, since there are likely underlying issues we aren't aware of and that would make it difficult to work on integration tests. The failures can be grouped into a few different categories:
\
to separate pathnames rather than /
\
instead of /
, however we often see this error when performing vaults operations on remote agentsError: EPERM: operation not permitted, unlink 'C:\Users\GITLAB~1\AppData\Local\Temp\polykey-test-m4rbmK\token'
in the sessions tests (Windows)EBUSY: resource busy or locked, unlink 'C:\Users\GITLAB~1\AppData\Local\Temp\polykey-test-4f3f1G\db\000005.ldb'
I've found additional problems on windows with respect to process.stdout.write
and console programs.
There are 3 dimensions to dealing with Windows for console programs.
os.EOL
to auto switchI've found that powershell and cmd also behave differently.
I've written this test script:
const os = require('os');
const process = require('process');
const fs = require('fs');
const { Buffer } = require ('buffer');
async function main(argv = process.argv) {
argv = argv.slice(2);
if (argv[0] === '0') {
console.log('Hello 盘');
} else if (argv[0] === '1') {
process.stdout.write(`HELLO${os.EOL}盘${os.EOL}`);
} else if (argv[0] === '2') {
// LF
process.stdout.write('HELLO\n盘\n');
} else if (argv[0] === '3') {
// CRLF
process.stdout.write('HELLO\r\n盘\r\n');
} else if (argv[0] === '4') {
// WITH setting the utf-8 encoding
process.stdout.setDefaultEncoding('utf-8');
// process.stdout.setEncoding('utf-8');
process.stdout.write('HELLO\n盘\n');
} else if (argv[0] === '5') {
// WITH setting the utf-8 encoding
process.stdout.setDefaultEncoding('utf-8');
process.stdout.write('HELLO\r\n盘\r\n');
} else if (argv[0] === '6') {
// Writing buffers directly
process.stdout.write(Buffer.from('hello\n盘\n', 'utf-8'));
} else if (argv[0] === '7') {
// Writing buffers directly
process.stdout.write(Buffer.from('hello\r\n盘\r\n', 'utf-8'));
} else if (argv[0] === '8') {
// Setting encoding AND writing buffers
process.stdout.setDefaultEncoding('utf-8');
// process.stdout.setEncoding('utf-8');
process.stdout.write(Buffer.from('hello\n盘\n', 'utf-8'));
} else if (argv[0] === '9') {
// Setting encoding AND writing buffers
process.stdout.setDefaultEncoding('utf-8');
process.stdout.write(Buffer.from('hello\r\n盘\r\n', 'utf-8'));
} else if (argv[0] === '10') {
process.stdout.write(
Buffer.from('hello\n盘\n', 'utf-8'),
'utf-8'
);
} else if (argv[0] === '11') {
// Writing with fs
fs.writeFileSync(1, 'hello\n盘\n');
} else if (argv[0] === '12') {
// Writing with fs
fs.writeFileSync(1, 'hello\r\n盘\r\n');
} else if (argv[0] === '13') {
// Writing with fs and specifying encoding
fs.writeFileSync(1, 'hello\n盘\n', 'utf-8');
} else if (argv[0] === '14') {
// Writing with fs and specifying encoding
fs.writeFileSync(1, 'hello\r\n盘\r\n', 'utf-8');
} else if (argv[0] === '15') {
// Writing with fs and buffers
fs.writeFileSync(1, Buffer.from('hello\n盘\n', 'utf-8'));
} else if (argv[0] === '16') {
// Writing with fs and buffers
fs.writeFileSync(1, Buffer.from('hello\r\n盘\r\n', 'utf-8'));
} else if (argv[0] === '17') {
// Writing to a file
fs.writeFileSync('./lf.txt', 'hello\n盘\n');
fs.writeFileSync('./crlf.txt', 'hello\r\n盘\r\n');
}
}
void main();
Then I've ran node ./test.js X > ./tmp/X.txt
and vary the X
from 0 to 16, then run node ./test.js 17
.
Or use this script on powershell.
for (($i = 0); $i -lt 17; $i++) { node ./test.js $i > ./tmp/$i.txt }
node ./test.js 17
The results show this:
Powershell ALWAYS converts everything to UTF-16 LE with CRLF, no exceptions, the only case where this doesn't occur is with 17, where nodejs directly writes to files.
»» ~/Desktop
♖ file *.txt (master) pts/2 15:52:01
0.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
10.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
11.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
12.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
13.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
14.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
15.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
16.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
1.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
2.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
3.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
5.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
6.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
7.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
8.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
9.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
10.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
crlf.txt: Unicode text, UTF-8 text, with CRLF line terminators
lf.txt: Unicode text, UTF-8 text
CMD doesn't actually support unicode characters on render, and it can't even show them anyway. So the Chinese character becomes just a question mark box. But what about LF/CRLF and encoding?
Well it turns out, that CMD preserves unicode encoding during piping. It does not convert it to UTF-16 LE. It also preserves line endings, it does not auto convert LF to CRLF. Basically it does what Linux does, and it is far more obvious about the behaviour.
»» ~/Desktop/cmd
♖ file *.txt (master) pts/2 15:55:53
0.txt: Unicode text, UTF-8 text
10.txt: Unicode text, UTF-8 text
11.txt: Unicode text, UTF-16, little-endian text, with CRLF line terminators
12.txt: Unicode text, UTF-8 text
13.txt: Unicode text, UTF-8 text
14.txt: Unicode text, UTF-8 text, with CRLF line terminators
15.txt: Unicode text, UTF-8 text
16.txt: Unicode text, UTF-8 text, with CRLF line terminators
1.txt: Unicode text, UTF-8 text, with CRLF line terminators
2.txt: Unicode text, UTF-8 text
3.txt: Unicode text, UTF-8 text, with CRLF line terminators
4.txt: Unicode text, UTF-8 text
5.txt: Unicode text, UTF-8 text, with CRLF line terminators
6.txt: Unicode text, UTF-8 text
7.txt: Unicode text, UTF-8 text, with CRLF line terminators
8.txt: Unicode text, UTF-8 text
9.txt: Unicode text, UTF-8 text, with CRLF line terminators
crlf.txt: Unicode text, UTF-8 text, with CRLF line terminators
lf.txt: Unicode text, UTF-8 text
Let's figure out a principle of least surprise here.
At first, I was thinking that we should use os.EOL
to ensure that when printing to a terminal, we are always outputting the OS-specific line endings.
However if someone is working across platforms, then it can be quite surprising if users expect that a text file that they output/pipe contains CRLF while normal file writes from PK creates LF line endings. This is because we do tell users to use >
and direction operators to write to files with our CLI.
We would preserve whatever line endings the user uses when they write the file to PK.
At the same time, it appears windows CMD and powershell understand \n
and will create newlines.
Therefore:
\n
and not os.EOL
when outputting anything on the terminal. Both CMD and powershell will render it as a newline.However we still have a problem:
Now according to https://nodejs.org/api/fs.html#fswritefd-string-position-encoding-callback
On Windows, if the file descriptor is connected to the console (e.g. fd == 1 or stdout) a string containing non-ASCII characters will not be rendered properly by default, regardless of the encoding used. It is possible to configure the console to render UTF-8 properly by changing the active codepage with the chcp 65001 command. See the chcp docs for more details.
But of course this would be an instruction to end users to set this up. Let me check.
Using chcp 65001
is actually not enough. One must also be using a font that supports unicode characters. I switched to lucida console from the default "Consolas", but again it is not capable of displaying Chinese characters.
These 2 Q&A show how complex it is for windows to setup to properly use UTF-8:
Using intl.cpl
with:
That's sufficient to change to chcp 65001
by default across the entire opreating system. This could be recommended to end users... but again could be quite problematic if the rest of the operating system isn't ready.
At the same time, one must choose a correct font. Luicda Console or the default Consolas is fine for displaying some unicode, but only SimSun-ExtB can do Chinese characters.
The font selection doesn't appear to affect SSHing into the powershell. That probably relies on whatever chcp
it is. So at the end of the day, users may recommended to use chcp 65001
or stick it into their profile.
THIS however does not fix the problem of piping. Still piping into any file proceeds to convert LF to CRLF and converts to UTF 16 LE.
Alternatively users can be recommended to use the new Windows terminal program or ConHost or ConEmu as alternatives.
It just seems at least on Windows, terminal programs are just not first class atm.
This Q&A https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8 explains how to actually change things like >
and >>
to using utf-8
. However on 5.1, this still creates UTF-8 BOM.
Powershell Core which is v6+ and this is not installed by default on Windows atm... does default to utf-8 BOMless. So we can also point users to downloading and installing powershell v6 (powershell core) instead of the original powershell.
I suspect this is a similar issue with LF to CRLF. We may just wait for more later version of powershell.
The end result is that:
utf-8
as the second parameter to any process.stdout.write(..., 'utf-8')
, this just ensures that we are in fact writing with utf-8
. Alternatively we can use process.stdout.setDefaultEncoding('utf-8')
and process.stderr.setDefaultEncoding('utf-8')
. At the beginning!
If we are using stdin, we should also set process.stdin.setEncoding('utf-8');
. This is for read stream.
Powershell may be converting this to UTF-16LE, but we will point to fixes relevant to CMD and powershell for the end user to deal with.
CMD does not auto convert, but people don't really use CMD anymore.\n
and not CRLF
, it is up to the end user to fix this appropriately. CMD and Powershell both render \n
properly though. These 2 issues are relevant though:
@tegefaulkes @emmacasolin I've left this change https://github.com/MatrixAI/js-polykey/issues/401#issuecomment-1186819981 on by default in matrix-win-1. Note that SSH does not get affected by font changes. It does get affected by the code page though. Gitlab SAAS should also make the relevant changes but I don't know if they did.
Some additional resources here https://shapeshed.com/writing-cross-platform-node/. Ignore the $HOMEPATH
it should be %USERPROFILE
or $USERPROFILE
is actually more correct.
Regarding async vs sync. As I've been trying to debug some test issues, having async console outputs by default is making this difficult. So let's standardise for 1. as well.
// Default behaviour on Node.js:
// Files: synchronous on Windows and POSIX
// TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
// Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
// In order to align Windows with POSIX behaviour:
if (process.stdout.isTTY) {
process.stdout._handle.setBlocking(true);
} else if (os.platform() === 'win32' && !process.stdout.isTTY) {
process.stdout._handle.setBlocking(false);
}
The relevant issue is https://github.com/nodejs/node/issues/11568.
The above might need to be done in PK, as it is a "global" application setting, rather than in js-logger.
However if we want this to be the case for tests, which just the js-logger directly, they will need to set those settings in the js-logger instead.
I think needs a PR on the js-logger for the StreamHandler
. It will be necessary to ensure consistent behaviour.
This has been created https://github.com/MatrixAI/js-logger/issues/22
These build tests run on stage build, not stage integration. So these have to be done before any further integration testing on Windows or MacOS specified in MatrixAI/Polykey-CLI#10.
This can only be done after MatrixAI/Polykey#419 is merged. In MatrixAI/Polykey#419, I want to integrate jest-extended
and fast-check
for more robust concurrency testing, obvious js-db 5.0.0 will also solve a bunch of concurrency problems too. Alot of testing non-determinism may happen due to concurrency scheduling, so more robust model based checks should be done when testing asynchronous effects.
I'm trying to gauge weather this is done. I've recently fixed the mac and windows build. There has been a bunch of manual testing on the mac build so we can safely say that is working. Windows should be working but it has its own idiosyncrasies so it needs some of it's own manual testing. But the point is these builds are working
There are CI integration jobs for all builds and they are running and passing but the windows and mac builds just run the help text. So minimally it's checking runtime loading of all dependencies. So the platform specific integration tests need to be expanded. but I'm not sure if that's a requirement of this issue.
I think this issue would evolve into a high level issue targeting all sorts of Windows related compatibility. We haven't done alot of work in Windows yet as we have still problems to fix in general. So fixing up platform specific issues for Windows has to be put till later, and identifying those tests are relevant.
The above comments are all relevant to various windows related issues. We can group these all together under a Windows compatibility project.
Specification
This is not about integration testing, this is about our unit tests
Not all of our tests are passing on Windows/Mac, both in the CI/CD and on
matrix-win-1
/matrix-mac-1
. Some of these are due to obvious platform differences (e.g. the usage of/
vs\
), but others may be more difficult to debug.Failing tests on windows:
tests/bin
/vaults/vaults.test.ts
/agent
/start.test.ts
/stop.test.ts
/status.test.ts
/secrets/secrets.test.ts
/bootstrap.test.ts
/utils.test.ts
tests/keys/KeyManager.test.ts
tests/network/Proxy.test.ts
tests/nodes/NodeConnection.test.ts
tests/vaults
/VaultInternal.test.ts
/VaultManager.test.ts
/utils.test.ts
/VaultOps.test.ts
Failing tests on Mac (all timeouts, even when only a single test is run):
tests/network/proxy.test.ts
tests/bin/agent/start.test.ts
Additional context
Tasks