Use `sccace` for caching?

NobodyXu commented 1 year ago

sccache might be able to help speedup CI by caching common dependencies, however there are a few caveats:

Absolute paths to files must match to get a cache hit. This means that even if you are using a shared cache, everyone will have to build at the same absolute path (i.e. not in $HOME) in order to benefit each other. In Rust this includes the source for third party crates which are stored in $HOME/.cargo/registry/cache by default.
Crates that invoke the system linker cannot be cached. This includes bin, dylib, cdylib, and proc-macro crates. You may be able to improve compilation time of large bin crates by converting them to a lib create with a thin bin wrapper.

I am particularly worried about the first limitation, since requiring absolute paths to file might exclude our use case.

I think the second limitation is ok since most crates have a lot of lib dependencies to be cached.

NobodyXu commented 1 year ago

Perhaps we should perform an experiment first to see if sccache if actually effective in our use case.

alsuren commented 1 year ago

I don't think it's possible to use sccache in a way that is compatible with our trust model. See security and trust here: http://alsuren.github.io/2022/07/10/cargo-quickinstall.html

cargo-quickinstall does not trust the author of any package on crates.io. As soon as we have run the crate’s build.rs or any proc macros, we must treat the build box as compromised.

(and the few paragraphs above that. It is the user who decides which package to trust by what they ask for on the command-line)

This means that we can't really upload any build artifacts to any kind of shared cache.

In theory we could have a separate cache for each package, to isolate things. This is likely to be huge though, and only useful for each package ~once per month when upstream does a release.

The architecture of cargo-quickbuild was designed to get around this trust problem. Unfortunately I never got it to work with proc macro creates before I ran out of time and gave up.

It may be that sccache has a way to build untrusted packages safely these days and I don't know about it. If so, it is worth investigating.

NobodyXu commented 1 year ago

I guess the best way to fix this is to add sandboxing mechanism to rustc itself by running the build.rs and proc macro in the sandbox.

NobodyXu commented 1 year ago

@alsuren One possible off-topic question: What if a malicious crate use the build.rs to steal our GHA token and use that to replace all binaries on the github release?

alsuren commented 1 year ago

@alsuren One possible off-topic question: What if a malicious crate use the build.rs to steal our GHA token and use that to replace all binaries on the github release?

It's an excellent question.

After merging https://github.com/cargo-bins/cargo-quickinstall/pull/86 the runner that does the building only has access to a token with permissions: {}.

NobodyXu commented 1 year ago

After merging https://github.com/cargo-bins/cargo-quickinstall/pull/86 the runner that does the building only has access to a token with permissions: {}.

That's good to hear!

But still I think we should at least use env -i to explicitly specify the environments, plus we could use firejail on linux to sandbox the compilation process, similar to docker or podman.

If it is possible to build mac and windows with zig-cc, then we can run the compilation entirely on Linux (the fastest CI from my experience) then use firejail or podman for sandboxing.

Additionally, we can use something like firecracker from Amazon, a docker-like container engine but uses kvm tech to reduce attack surface and is written in rust.

Or we can use gVisor, similar tech developed by Google but in Go...

NobodyXu commented 1 year ago

@alsuren I posted this idea on internals.rust-lang.org

Not very polished and maybe even too naive, but still I think it's one of the solutions worth discussing.

NobodyXu commented 1 year ago

@alsuren I've investigated firecracker and runsc again and I found two issues with regarding to deploying them on GHA.

First, GHA does not seem to support nested virtualisation, so firecracker is not viable and we would have to use gVisor with ptrace mode, which should be fine given that rustc is computation heavy.

Second, runsc is designed for container usage so it needs a base image to work and then use bind-mount to pass through stuff.

We could simply use the scratch base image which is empty, then bind-mount directories we want into the container, but it feels hacky.

Perhaps the best way to use runsc to sandbox rustc is to pull and use the official rust image then bind-mount /target/release/... for our use case, but we would have to keep the rustc in the rust img and the cargo in the GHA in sync.

And that means it's easier to just use podman + runsc to write a small rust bin that is run via RUSTC_WRAPPER (because parsing rustc args in bash is a pain/fragile and I would rather write it in rust).

Alternatively, I am thinking of implementing our own sandbox bin that doesn't need a base img using seccomp + seccomp_unotify to whitelist and intercept syscalls.

In the mvp, I want to start with using fuse to intercept most fs related syscall instead of seccomp for simplicity, robustness, portability and performance, as it seems that fuse can be run on io-uring and there are crates on lib.rs that supports this (e.g. fuse-backend-rs) while seccomp_unotify needs ioctl to work and even with libseccomp, we still needs to manually retrieve the args from register file based on the architecture.

The fuse will also intercept getrandom and has its own chacha20 random generator for /dev/random, /dev/urandom and syscall getrandom just to be safe that the processes cannot exhaust the system-wide entropy quickly or try to launch some exotic attacks on it.

I will use seccomp_unotify to intercept syscalls like ioctl or maybe some fcntl cmds though for they are very complex and support too many cmds.

I also plan to intercept syscalls such as kill, pidfd_open, waitpid, etc to make sure any process can only send signal to its children, and intercept syscalls such as sched_getparams, getuid, etc just to be safe.

Finally, for syscalls such as getcpu, gettime, I will just let them pass through since there is no harm letting the process access that info and it's unlikely for the kernel to have bugs in these simple syscalls.

All other syscalls, such as unshare, clone (no clone3 as it pass args on stack) for anything other than creating regular thread/process (such as namespace), ptrace, io-uring, the syscal for setting global time, uid/gid/groups, scheduler, etc as they should not use them anyway and I think it's probably the source of CVEs for linux namespace, along with fs and network stack CVEs.

I also plan to have networking completely disabled (including unix socket) in the mvp and I plan to intercept syscall socket and socketpair for only TCP and UDP and deny other proto, connect, bind, sendto, sendmsg, sendmmsg, recvfrom, recvmsg, recvmmsg as they can specify the dest address to anywhere for udp and also setopt, getsockname and getpeetname.

Also we have NO_NEW_PRIVS on and any setuid/gid binary will be ignored by the fuse.

After having MVP, it could be a good idea to intercept all fs-related syscalls and pass that directly to the fuse server to reduce attack surface, though I think using fuse with seccomp is quite safe.

Interceptting all fs-related syscalls will also make network stack easier to implement, but less performant for tcp and fs since now every syscall needs to go through at least 4 context switching, probably more for retrieving fd or read from/write into mem, not to mention fuse can run on io-uring.

We would still need fuse for execve/execveat, otherwise we would have to use seccomp + ptrace (seccomp can trigger ptrace event) to rewrite execve to be execveat syscalls, then inject the fd of the executable into the process, setting pathname to "\0", setting flags to AT_EMPTY_PATH and program args/envs into register.

I'd prefer not to use ptrace as it involves a lot of arch-specific code and feels like hacking for me to change syscall num/args like that.

I would really wish seccomp can be used to rewrite syscalls.

I thought about completely take over clone, waitpid and execve using cuid, that will enable every process related info to be managed in userspace, but this is so complicated as we need to mmap everything from original process to new children, though execve is simple to do and zombies now can take less space.

NobodyXu commented 1 year ago

github now supports nested virtualization, so we can use gVisor or firecracker on GHA!

alsuren commented 1 year ago

This all sounds very complicated.

Complicated things tend not to be secure.

Can you draw a diagram of where the information flows from and to, and where the untrusted code execution happens?

(Try to include VM/runner/network boundaries and also processes like cargo and rustc and the server that holds the cache artifacts. If you use excalidraw, export as PNG and embed scene so I can now easily make edits when replying)

This is also very Linux-specific. It would be good to have a scheme that allows native windows and mac builds to be accelerated too.

On Fri, 24 Feb 2023, 01:17 Jiahao XU, @.***> wrote:

github now supports nested virtualization https://github.blog/changelog/2023-02-23-hardware-accelerated-android-virtualization-on-actions-windows-and-linux-larger-hosted-runners/, so we can use gVisor or firecracker on GHA!

— Reply to this email directly, view it on GitHub https://github.com/cargo-bins/cargo-quickinstall/issues/142#issuecomment-1442657464, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB6FN62D3CABDWLFLJ5643WZAD2TANCNFSM6AAAAAAT7XDI4Y . You are receiving this because you were mentioned.Message ID: @.***>

NobodyXu commented 1 year ago

@alsuren I just realize that cargo is the one responsible for running build.rs, not rustc, so just sandboxing rustc via RUSTC_WRAPPER isn't enough.

I think we'd have to wait for cargo and rustc to support compiling proc-macros and build.rs to WASI, which is cross-platform and much more secure.

Though I'm still not sure how they will sandbox build.rs given that it needs to spawn external process.

NobodyXu commented 1 year ago

After trying sccache in multiple repositories, it doesn't really speedup the CI, possibly due to GHA cache being too slow.

Closing this as not planned for now, if rust supports compiling proc-macro and build.rs to WASI then perhaps we can reconsider it.

cargo-bins / cargo-quickinstall

Use `sccace` for caching? #142