Instrument OS-Lib to block filesystem writes outside of designated `Task.dest` directories (2000USD Bounty)

lihaoyi commented 1 week ago

From the maintainer Li Haoyi: I'm putting a 2000USD bounty on this issue, payable by bank transfer on a merged PR implementing this.

The goal of this bounty is to ensure that users on the "happy path" of accessing the filesystem via OS-Lib do not accidentally do the wrong thing by writing to places on disk they shouldn't. This isn't meant to be a 100% secure sandbox to protect against malicious code, but rather just guardrails for users to bump into when they accidentally do the wrong thing and help nudge them back in the right direction.

In particular, we want to block:

Code inside task writing to paths on disk outside of that task's designated Task.dest folder
Code outside of tasks writing to disk at all, i.e. during module initialization, which is meant to be pure
Code inside of tasks reading from disk outside of the .dest folders of upstream paths or Task.Source/Task.Sources/Task.Inputs

Some subtleties:

We only want to block unexpected writes to disk inside of WorkspaceRoot.
- Outside of the WorkspaceRoot still needs to be writable, due to things like caches in /User or /home and temporary files/folders in /tmp.
- This is a similar approach as tools like Bazel
We only want to block OS-Lib APIs for now.
- People can always use java.io/java.nio/JNI/subprocesses/etc. to work around it, but at least in the common path there'll be something to nudge them when they do something wrong.
- A fancier approach could use the JVM SecurityManager/bytecode-instrumenting-agent to block java-level file access as well, but for the purposes of this "guardrails" approach that is probably not necessary

Milestones

[ ] Instrument the various APIs in https://github.com/com-lihaoyi/os-lib to expose a scala.util.DynamicVariable allowing thread-local gating of reads and writes to specific paths, allowing us to throw errors if a read or write is disallowed (500USD)
[ ] Use the patched OS-Lib to limit writes from code running inside a task to locations within WorkspaceRoot but outside the task's Task.dest folder, with a flag to opt-out if desired (500USD)
[ ] Limit writes from code running during module-initialization/task-resolution to any location within WorkspaceRoot but outside the task's Task.dest folder, with a flag to opt-out if desired (500USD)
[ ] Limit reads from code running inside of tasks accessing locations outside of PathRefs received by that task's direct upstream dependencies. (500USD)

As adding such sandbox restrictions are breaking changes, they should be placed behind a flag in the upcoming Mill 0.12.x series and only turned on by default in 0.13.0

Ichoran commented 1 week ago

Are you sure you want to make os-lib that deeply dependent on a fairly specialized feature? Might it be better to have an API-compatible alternative build so you could depend on os-lib-fenced instead of os-lib if you needed fenced filesystem operations?

Also, everything you mention is JVM-specific but os-lib has a native build as well. Is fencing supposed to work with native?

lihaoyi commented 1 week ago

@Ichoran it seems the most reasonable option. The alternative is we do bytecode instrumentation.

The implementation cost of stuffing this into OS-Lib seems relatively low: just a single DynamicVariable containing a trait Gating{ def allowRead(p: os.Path): Unit; def allowWrite(p: os.Path): Unit } that we can expand upon over time if necessary.

Nothing here seems JVM-specific or Mill-specific. Seems plausible that there would be other use cases for best-effort limiting of filesystem access as well. The JVM baked this stuff in 30 years ago, and although it is now taking it out, I think it's actually a pretty reasonable feature to have as long as expectation are set appropriately (e.g. best-effort guardrails, not a security-sensitive sandbox as the JVM was hoping for)

Ichoran commented 1 week ago

@lihaoyi - Well, the interface change is small, but it's going to touch a lot of code, and make every user of threads add an extra ThreadLocal variable to everything. That's a big deal for Project Loom-style usage. "Don't use os-lib and virtual threads together" isn't a great message.

lihaoyi commented 1 week ago

Why is it a big deal for project loom usage?

Ichoran commented 1 week ago

The ThreadLocal variable has to be duplicated for every use of every thread since otherwise ThreadLocal is broken. So it adds to the per-thread overhead. The Oracle docs all caution against it. I haven't actually run a microbenchmark myself, though.

lihaoyi commented 1 week ago

@Ichoran does that mean println is banned in project loom?

Ichoran commented 1 week ago

println uses ThreadLocal? Huh. Did not know that. Regardless, maybe I should actually do the microbenchmarking and then raise the concern if it's a non-negligible impact.

lihaoyi commented 1 week ago

Yeah, I'm going to say performance is not a problem, until benchmarks demonstrate it is

Ichoran commented 1 week ago

Well, it's measurable, but I guess it's not really important. It seems to be negligible on creation, just usage, and it's only if there's a creation issue that it's really something to avoid. It's a little hard to do benchmarking of threads on a laptop CPU; even if I turn off as many CPU features as I can, it's still a bit inconsistent. Using the benchmark I wrote here, I get the following:

Benchmark                 Mode  Cnt    Score    Error  Units
ThreadLocalLoom.base      avgt   40  641.756 ±  6.503  us/op
ThreadLocalLoom.local10   avgt   40  639.881 ±  9.644  us/op
ThreadLocalLoom.localmod  avgt   40  672.530 ± 17.414  us/op
ThreadLocalLoom.localuse  avgt   40  657.495 ±  9.196  us/op

and if you look through the individual counts, you see the ~640 to ~655 shift as a very common outcome. So, maybe a 3% slowdown above the thread handling overhead. (Most of the time is thread overhead--probably 90% or so. I didn't measure carefully.) Mod vs. Use depends on whether the thread changes the value or just consumes it from a withValue block.

The key point, though is the local10 one, where ten extra DynamicVariable instances are created but they're not used. This has no performance penalty, and that's the one I was worried about.

So, I'm incorrect! A lot of the advice, including from Oracle, also seems to be incorrect or at least too cautious. One can use ThreadLocal with about the usual degree of slowdown for ThreadLocal, probably on the order of 30 ns, because it's about 15 us in a test which uses it 500-odd times.

Sorry about the unwarranted concern! Original plan sounds fine!

ajaychandran commented 1 week ago

@lihaoyi

Instrument the various APIs in https://github.com/com-lihaoyi/os-lib to expose a scala.util.DynamicVariable allowing thread-local gating of reads and writes to specific paths, allowing us to throw errors if a read or write is disallowed (500USD)

Does this extend to get/set ops for permissions/stats?
Should file/folder create/remove ops require write access on the parent folder?
- When creating intermediate parent folders, write access should be required on the closest existing parent?
When moving files/folders, both read/write access should be required on the source parent folder?

lihaoyi commented 1 week ago

Does this extend to get/set ops for permissions/stats?

Only for set operations

Should file/folder create/remove ops require write access on the parent folder? When creating intermediate parent folders, write access should be required on the closest existing parent? When moving files/folders, both read/write access should be required on the source parent folder?

I think we can start with an API as follows

trait Checker{
  def onWrite(path: os.Path): Unit
  def onRead(path: os.Path): Unit
}

where onWrite and onRead are given the fully qualified path where any operation is taking place (not the parent folder). That way it is up to the implementor of Checker (or whatever we call it) how they want to model permissions, deal with nested permissions and so on. OS-Lib does not need to have an opinion on that, it just needs to provide the minimal hooks and the downstream implementor (e.g. Mill) can come up with whatever model is suitable

com-lihaoyi / mill

Instrument OS-Lib to block filesystem writes outside of designated `Task.dest` directories (2000USD Bounty) #3746

Milestones