Closed warner closed 1 month ago
I don't remember if you can set
CLOEXEC
on a filehandle after it's been opened
CLOEXEC discussion on stackoverflow strongly suggests that we can.
We should look at the LMDB JS bindings and see if it has a way to influence this.
Doesn't look like it. ldmb.c
provides mdb_env_get_fd()
which seems to be part of coordinating between processes...
LMDB and this library are designed for high concurrency, and we recommend using multiple processes to achieve concurrency
It's not clear whether this is exposed to JS; I do see getEnvMap
exported from native.js.
Any reason we're not spawning xsnap processes as detached
?
Edit: I'm not sure if it'd help in this case.
We no longer have LMDB at all so this particular fd is no longer an issue.
The larger issue of fds being inherited is non-ideal. It is low risk because a contract still will not have access to a file descriptor duped into its vat.
sqlite.c
uses O_CLOEXEC
in their robust_open()
call, so I'm betting the DB's filehandles (swing-store.sqlite
and sqlite-store.sqlite-wal
) are no longer a problem. I'd like to test this explicitly first, though, just in case robust_open()
isn't what they use for those files.
We might still be leaking other FDs into the worker, like the SLOG_SENDER pipe.
We should triple-check that we aren't leaking existing worker pipes into the environment of a new worker, but I just looked at a pismoA follower node and data.mdb
(and stdout/stderr) were the only ones that seemed out of place.
We might still be leaking other FDs into the worker, like the SLOG_SENDER pipe.
If SLOGSENDER_AGENT=process
is set, the slog sender is simply started in a sub process similar to how xsnap-worker processes are. I don't see a reason these subprocess pipes would leak between siblings.
I just did a new lsof
on a worker process (v9-zoe on a mainnet follower), and here's what it's got open:
warner@agreeable:~$ lsof -p 2598567
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
xsnap-wor 2598567 warner cwd DIR 259,2 4096 41811970 /home/warner
xsnap-wor 2598567 warner rtd DIR 259,2 4096 2 /
xsnap-wor 2598567 warner txt REG 259,2 1778960 51649586 /home/warner/stuff/agoric/upgrade-16/packages/xsnap/xsnap-native/xsnap/build/bin/lin/release/xsnap-worker
xsnap-wor 2598567 warner mem REG 259,2 2220400 13631560 /usr/lib/x86_64-linux-gnu/libc.so.6
xsnap-wor 2598567 warner mem REG 259,2 940560 13632042 /usr/lib/x86_64-linux-gnu/libm.so.6
xsnap-wor 2598567 warner mem REG 259,2 240936 13631547 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
xsnap-wor 2598567 warner 0r CHR 1,3 0t0 5 /dev/null
xsnap-wor 2598567 warner 1w REG 259,2 21668914 41817615 /home/warner/chain.out
xsnap-wor 2598567 warner 2w REG 259,2 21668914 41817615 /home/warner/chain.out
xsnap-wor 2598567 warner 3u unix 0x0000000000000000 0t0 1817937 type=STREAM
xsnap-wor 2598567 warner 4u unix 0x0000000000000000 0t0 1817939 type=STREAM
xsnap-wor 2598567 warner 7u unix 0x0000000000000000 0t0 1817941 type=STREAM
xsnap-wor 2598567 warner 8u unix 0x0000000000000000 0t0 1817943 type=STREAM
which says that there are no DB files open, just the stdout
/stderr
of the parent process (the main Node.js agd
process, whose output I'm capturing to chain.out
).
That sounds like we can close this issue then ?
I've straced scenario2 to capture all of the files opened, until it settled into a steady state:
$ strace -e trace=open,openat,openat2 --follow-forks -o strace.txt make scenario2-run-chain
CHAIN_BOOTSTRAP_VAT_CONFIG="${CHAIN_BOOTSTRAP_VAT_CONFIG-$PWD/../vm-config/decentral-devnet-config.json}" \
OTEL_EXPORTER_PROMETHEUS_PORT=9464 \
DEBUG=SwingSet:ls,SwingSet:vat PATH="$PWD/bin:$PATH" "/home/node/trees/agoric-sdk-lag/bin/agd" --home=t1/n0 start --log_level=warn
Loading slog sender modules: @agoric/telemetry/src/flight-recorder.js
Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
2024-09-26T19:50:55.521Z launch-chain: Launching SwingSet kernel
2024-09-26T19:52:17.813Z launch-chain: Launched SwingSet kernel
2024-09-26T19:52:17.814Z block-manager: block bootstrap
2024-09-26T19:52:19.009Z SwingSet: vat: v1: powerStore: new Promise
2024-09-26T19:52:19.014Z SwingSet: vat: v1: powerStore settled; remaining: []
2024-09-26T19:52:22.101Z SwingSet: vat: v1: namedVat vatSpace: createVatByName(agoricNames)
2024-09-26T19:52:24.278Z SwingSet: vat: v1: agoricNames: new Promise
2024-09-26T19:52:24.279Z SwingSet: vat: v1: agoricNamesAdmin: new Promise
2024-09-26T19:52:24.280Z SwingSet: vat: v1: vatStore: new Promise
[ . . . ]
2024-09-26T20:04:42.281Z block-manager: block 51 begin
2024-09-26T20:04:42.290Z block-manager: block 51 commit
2024-09-26T20:04:47.291Z block-manager: block 52 begin
2024-09-26T20:04:47.300Z block-manager: block 52 commit
2024-09-26T20:04:52.301Z block-manager: block 53 begin
2024-09-26T20:04:52.337Z SwingSet: vat: v47: ----- SCHED.8 10 wake step { absValue: 1_727_381_087n, timerBrand: Object [Alleged: timerBrand] {} }
2024-09-26T20:04:52.344Z SwingSet: vat: v47: ----- SCHED.8 11 clockTick { timerBrand: Object [Alleged: timerBrand] {}, absValue: 1_727_380_802n } { absValue: 1_727_381_087n, timerBrand: Object [Alleged: timerBrand] {} }
2024-09-26T20:04:52.345Z SwingSet: vat: v47: ----- Auction.9 3 reducePriceAndTrade
2024-09-26T20:04:52.353Z SwingSet: vat: v47: ----- SMath.7 4 calc { timerBrand: Object [Alleged: timerBrand] {}, absValue: 1_727_380_802n } { absValue: 1_727_381_087n, timerBrand: Object [Alleged: timerBrand] {} }
2024-09-26T20:04:52.443Z block-manager: block 53 commit
2024-09-26T20:04:57.312Z block-manager: block 54 begin
2024-09-26T20:04:57.321Z block-manager: block 54 commit
2024-09-26T20:05:02.323Z block-manager: block 55 begin
2024-09-26T20:05:02.332Z block-manager: block 55 commit
2024-09-26T20:05:07.332Z block-manager: block 56 begin
2024-09-26T20:05:07.341Z block-manager: block 56 commit
2024-09-26T20:05:12.342Z block-manager: block 57 begin
2024-09-26T20:05:12.351Z block-manager: block 57 commit
[ . . . ]
Captured PIDs of what's running in the container before stopping the scenario2:
$ ps aux > pids.txt
I then inspected strace.txt manually and filtered out irrelevant log entries to see which, if any, files are being opened without O_CLOEXEC
:
$ grep open strace.txt | egrep -v 'CLOEXEC|resumed|\/proc\/self\/|\/dev\/|\/sys\/'
66204 openat(AT_FDCWD, "Makefile", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/bin/agd", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/bin/../scripts/agd-builder.sh", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/scripts/../repoconfig.sh", O_RDONLY) = 3
66227 openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
66227 openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY) = 20
66227 openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY) = 20
66227 openat(AT_FDCWD, "/etc/netsvc.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
66227 openat(AT_FDCWD, "/etc/svc.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
66246 openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
66258 openat(AT_FDCWD, "./scripts/get_xsnap_version.sh", O_RDONLY) = 3
66258 openat(AT_FDCWD, "/lib/terminfo/x/xterm", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/lib/terminfo/x/xterm", O_RDONLY) = 3
66264 openat(AT_FDCWD, "/etc/localtime", O_RDONLY) = 8
66207 openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
66288 openat(AT_FDCWD, "/etc/localtime", O_RDONLY) = 26
$ for i in $(grep open strace.txt | egrep -v 'CLOEXEC|resumed|\/proc\/self\/|\/dev\/|\/sys\/' | cut -f1 -d' ' | sort -u); do grep ${i} pids.txt ; done;
node 66204 0.0 0.0 2388 1556 pts/0 S+ 19:50 0:00 make scenario2-run-chain
node 66207 51.0 5.8 7299340 467152 pts/0 Sl+ 19:50 7:42 node /home/node/trees/agoric-sdk-lag/packages/cosmic-swingset/src/entrypoint.js --home /home/node/.agoric --home=t1/n0 start --log_level=warn
$ grep open strace.txt | egrep -v 'CLOEXEC|resumed|\/proc\/self\/|\/dev\/|\/sys\/' | grep 66207
66207 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/bin/agd", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/bin/../scripts/agd-builder.sh", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/scripts/../repoconfig.sh", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/lib/terminfo/x/xterm", O_RDONLY) = 3
66207 openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
Finally, I've captured and inspected lsof
for each of the workers (they all look essentially the same):
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
xsnap-wor 62812 node cwd DIR 0,36 1088 354943 /home/node/trees/agoric-sdk-lag/packages/cosmic-swingset
xsnap-wor 62812 node rtd DIR 0,61 4096 1601312 /
xsnap-wor 62812 node txt REG 0,36 1699272 362983 /home/node/trees/agoric-sdk-lag/packages/xsnap/xsnap-native/xsnap/build/bin/lin/release/xsnap-worker
xsnap-wor 62812 node mem REG 0,38 362983 /home/node/trees/agoric-sdk-lag/packages/xsnap/xsnap-native/xsnap/build/bin/lin/release/xsnap-worker (path dev=0,36)
xsnap-wor 62812 node mem REG 0,61 1451024 1311065 /lib/aarch64-linux-gnu/libc-2.31.so
xsnap-wor 62812 node mem REG 0,61 160200 1311108 /lib/aarch64-linux-gnu/libpthread-2.31.so
xsnap-wor 62812 node mem REG 0,61 633000 1311086 /lib/aarch64-linux-gnu/libm-2.31.so
xsnap-wor 62812 node mem REG 0,61 14560 1311073 /lib/aarch64-linux-gnu/libdl-2.31.so
xsnap-wor 62812 node mem REG 0,61 145328 1311053 /lib/aarch64-linux-gnu/ld-2.31.so
xsnap-wor 62812 node 0r CHR 1,3 0t0 5 /dev/null
xsnap-wor 62812 node 1u CHR 136,0 0t0 3 /dev/pts/0
xsnap-wor 62812 node 2u CHR 136,0 0t0 3 /dev/pts/0
xsnap-wor 62812 node 3u unix 0x00000000830eeb7d 0t0 5443358 type=STREAM
xsnap-wor 62812 node 4u unix 0x000000003d91ed9b 0t0 5443360 type=STREAM
xsnap-wor 62812 node 7u unix 0x00000000ce46f5df 0t0 5443362 type=STREAM
xsnap-wor 62812 node 8u unix 0x000000003d3b9b9a 0t0 5443364 type=STREAM
As we can see:
node
without O_CLOEXEC
node
PID without O_CLOEXEC
are present in the lsof
output for any of the workersConclusion:
node
and its libraries either use O_CLOEXEC
or close the unneeded files before exec
.
This issue can be closed.
To make sure that we don't regress, we could migrate that strace work from scenario2 to a3p-integration and add a z:acceptance test.
To make sure that we don't regress, we could migrate that strace work from scenario2 to a3p-integration and add a z:acceptance test.
Why a3p? That seems overkill. It doesn't need to be an integration test, just a "unit" test, or at least not a test that requires so much setup ceremony.
Why a3p?
Only because scenario2 was chosen for the manual testing.
To make sure that we don't regress, we could migrate that strace work from scenario2 to a3p-integration and add a z:acceptance test.
Just to be clear: all of that strace work had been done by hand with lots of eyeballing in between. Any sort of "migrate" or "add to automated testing" actually means an automated test has to be developed and tested with reproducible synthetic positive and negative results before it can catch any possible unintended leaks in the future. This sounds like a whole separate issue which has to be filed, prioritized, and scheduled.
Yeah that's definitely not worth it.
... an automated test has to be developed and tested with reproducible synthetic positive and negative results before it can catch any possible unintended leaks in the future. This sounds like a whole separate issue which has to be filed, prioritized, and scheduled.
It's definitely a lot of work. But increasingly, that sort of work is included in the test plan for features / fixes such as this one.
Whether it's cost-effective is always a judgement call, and in this case the call has been made, but in general, I think but we try not to postpone testing to separate issues.
Just to explicitly confirm whether SQLite does use O_CLOEXEC
here is an strace sample from some other tests:
66904 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/node_modules/better-sqlite3/build/Release/better_sqlite3.node", O_RDONLY|O_CLOEXEC) = 19
66904 openat(AT_FDCWD, "/tmp/testdb-66904-loZIJMYrz5gG/swingstore.sqlite", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 19
66904 openat(AT_FDCWD, "/tmp/testdb-66904-loZIJMYrz5gG/swingstore.sqlite-journal", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 20
66904 openat(AT_FDCWD, "/tmp/testdb-66904-loZIJMYrz5gG/swingstore.sqlite-wal", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 20
66904 openat(AT_FDCWD, "/tmp/testdb-66904-loZIJMYrz5gG/swingstore.sqlite-shm", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 21
66904 openat(AT_FDCWD, "/tmp/testdb-66904-loZIJMYrz5gG/swingstore.sqlite", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 22
66904 openat(AT_FDCWD, "/tmp/testdb-66904-loZIJMYrz5gG/swingstore.sqlite-wal", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 23
66904 openat(AT_FDCWD, "/tmp/testdb-66904-kM5TmhfDJGpL/swingstore.sqlite", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 24
66904 openat(AT_FDCWD, "/tmp/testdb-66904-kM5TmhfDJGpL/swingstore.sqlite-journal", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 25
66904 openat(AT_FDCWD, "/tmp/testdb-66904-kM5TmhfDJGpL/swingstore.sqlite-wal", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 25
66904 openat(AT_FDCWD, "/tmp/testdb-66904-kM5TmhfDJGpL/swingstore.sqlite-shm", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 26
66904 openat(AT_FDCWD, "/tmp/testdb-66904-kM5TmhfDJGpL/swingstore.sqlite", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 27
66904 openat(AT_FDCWD, "/tmp/testdb-66904-kM5TmhfDJGpL/swingstore.sqlite-wal", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = 19
As can be seen above, SQLite does in fact use O_CLOEXEC
.
Another experiment, courtesy of ChatGPT:
import fs from 'fs';
const path = './example.txt';
try {
// Open the file synchronously in read-write mode (or create it if it doesn't exist)
const fd = fs.openSync(path, 'w+');
console.log(`File opened successfully. File descriptor: ${fd}`);
// Write data to the file
const buffer = Buffer.from('Hello, Node.js!');
fs.writeSync(fd, buffer, 0, buffer.length, 0);
console.log('Data written to file.');
// Read data from the file
const readBuffer = Buffer.alloc(buffer.length);
fs.readSync(fd, readBuffer, 0, buffer.length, 0);
console.log(`Data read from file: ${readBuffer.toString()}`);
// Close the file
fs.closeSync(fd);
console.log('File closed successfully.');
} catch (err) {
console.error(`Error: ${err.message}`);
}
$ strace -e trace=open,openat,openat2 --follow-forks -o strace_fs.txt node fs_test.js
File opened successfully. File descriptor: 17
Data written to file.
Data read from file: Hello, Node.js!
File closed successfully.
67136 openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/usr/lib/aarch64-linux-gnu/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
67136 openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
67136 openat(AT_FDCWD, "/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
67136 openat(AT_FDCWD, "/proc/meminfo", O_RDONLY|O_CLOEXEC) = 17
67136 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/packages/SwingSet/package.json", O_RDONLY|O_CLOEXEC) = 17
67146 openat(AT_FDCWD, "/home/node/trees/agoric-sdk-lag/packages/SwingSet/fs_test.js", O_RDONLY|O_CLOEXEC) = 17
67136 openat(AT_FDCWD, "./example.txt", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 17
67136 openat(AT_FDCWD, "/dev/pts/1", O_RDWR|O_NOCTTY|O_CLOEXEC) = 18
67136 openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 19
67136 openat(AT_FDCWD, "/dev/pts/1", O_RDWR|O_NOCTTY|O_CLOEXEC) = 20
67140 +++ exited with 0 +++
67141 +++ exited with 0 +++
67139 +++ exited with 0 +++
67138 +++ exited with 0 +++
67137 +++ exited with 0 +++
67146 +++ exited with 0 +++
67143 +++ exited with 0 +++
67145 +++ exited with 0 +++
67144 +++ exited with 0 +++
67142 +++ exited with 0 +++
67136 +++ exited with 0 +++
This is what we are looking for:
67136 openat(AT_FDCWD, "./example.txt", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC
, 0666) = 17
Go:
package main
import (
"fmt"
"os"
)
func main() {
// Define the file path
path := "./example.txt"
// Open the file with read-write permissions, create if it doesn't exist
// O_RDWR allows reading and writing, O_CREATE creates the file if it doesn't exist
file, err := os.OpenFile(path, os.O_RDWR|os.O_CREATE, 0755)
if err != nil {
fmt.Printf("Error opening file: %v\n", err)
return
}
defer file.Close() // Ensure the file is closed at the end
// Write data to the file
message := []byte("Hello, Go!")
_, err = file.Write(message)
if err != nil {
fmt.Printf("Error writing to file: %v\n", err)
return
}
fmt.Println("Data written to file.")
// Read data from the file
// Reset the file offset to the beginning
_, err = file.Seek(0, 0)
if err != nil {
fmt.Printf("Error seeking file: %v\n", err)
return
}
// Create a buffer to hold the data
readBuffer := make([]byte, len(message))
_, err = file.Read(readBuffer)
if err != nil {
fmt.Printf("Error reading from file: %v\n", err)
return
}
fmt.Printf("Data read from file: %s\n", string(readBuffer))
// File will be closed automatically with the defer statement
fmt.Println("File closed successfully.")
}
$ strace -e trace=open,openat,openat2 --follow-forks -o strace_fs_go.txt go run test_fs.go
Data written to file.
Data read from file: Hello, Go!
File closed successfully.
$ grep example.txt strace_fs_go.txt
67461 openat(AT_FDCWD, "./example.txt", O_RDWR|O_CREAT|O_CLOEXEC, 0755) = 3
Node uses O_CLOEXEC
by default:
https://github.com/nodejs/node/blob/main/deps/uv/src/unix/fs.c#L382
Golang does the same: https://github.com/golang/go/blob/master/src/os/file_unix.go#L279
So does SQLite: https://github.com/sqlite/sqlite/blob/master/src/os_unix.c#L724
Describe the bug
Ari and I were looking at
lsof
outputs, and noticed that all of ourxsnap
worker processes are holding open a file descriptor on the LMDBdata.mdb
file (which holds the kvStore).The worker would never open that file itself, which tells me that it was open in the parent process (kernel/Node.js) and inherited by the worker during the
spawn()
. Which tells me that LMDB is not opening that file handle with theCLOEXEC
"please close this fd when youexec
into a new process" flag.xsnap
doesn't know that this fd is open, but nevertheless we should not be allowing xsnap to see it. When we manage to implement #2386 andseccomp
, this extra fd would represent inappropriate authority (an attacker who manages to compromise the worker process could use it to compromise the rest of the kernel, even thoughseccomp
prevents them from opening any new files).I don't remember if you can set
CLOEXEC
on a filehandle after it's been opened. We should look at the LMDB JS bindings and see if it has a way to influence this.Another option is to change
xsnap
and have it conservatively close all fds higher than the command pipes, at startup.