JuliaLang / libuv

Cross-platform asynchronous I/O
http://libuv.org/
MIT License
9 stars 14 forks source link

Unix: If the `statx` syscall returns EACCES or EPERM, fall back to regular `stat`/`lstat`/`fstat` #7

Closed DilumAluthge closed 4 years ago

DilumAluthge commented 4 years ago

Fixes https://github.com/JuliaLang/julia/issues/34918


If you try to make the statx syscall inside a non-privileged container (e.g. a non-privileged Docker container), you will get EACCES or EPERM. This error does not actually mean that the file/directory that you are stat-ing has a permissions error. Rather, it means that inside the non-privileged container, you are not permitted to make the statx syscall.

If this happens, we should fall back to the regular stat/lstat/fstat, which work perfectly fine inside non-privileged containers.

cc: @Keno @vtjnash

DilumAluthge commented 4 years ago

@Keno @vtjnash Is there CI for this repository?

Keno commented 4 years ago

No, I don't think this is right. If the udocker environment doesn't support statx, it should enosys it properly. Unless this is kernel behavior, we shouldn't add an arbitrary fallback path.

DilumAluthge commented 4 years ago

I believe that it’s actually seccomp that is blocking the call to statx. seccomp is a kernel feature.

Keno commented 4 years ago

Well, I'd you want to be pedantic it is ;). But the user needs to configure seccomp to show this behavior and configuring it that way is a bug. You can use seccomp to have syscalls return arbitrary error messages.

On Tue, Mar 3, 2020, 00:23 Dilum Aluthge notifications@github.com wrote:

I believe that it’s actually seccomp that is blocking the call to statx. seccomp is a kernel feature.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/libuv/pull/7?email_source=notifications&email_token=AAJ3LF3ZSTEE4TOTLXMSOLDRFSH37A5CNFSM4K7PKI62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENSEJHQ#issuecomment-593773726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ3LFZ3ZX5W2XZMGWVRCK3RFSH37ANCNFSM4K7PKI6Q .

DilumAluthge commented 4 years ago

What if a user configures their seccomp whitelist to deny statx but allow stat? I would argue that in this situation, the user might still want to use the stat syscall.

But what we have here is that if the user denies statx, then they are unable to use any stat functionality in Julia, even if they have configured seccomp to explicitly allow stat.

Keno commented 4 years ago

What if the user whitelists statx, but only on the second call to that function? There is a proper way to blacklist statx, which is to have it return enosys. It returning eaccess is just a bug.

DilumAluthge commented 4 years ago

https://books.google.com/books?id=jtS-DwAAQBAJ&pg=PA140&lpg=PA140&dq=seccomp+syscall+eacces&source=bl&ots=W6Ugkuqhny&sig=ACfU3U39sRwkxscPyLx4gwy1C4ZJs31ikg&hl=en&sa=X&ved=2ahUKEwjM8IGazf3nAhWThHIEHTkhDogQ6AEwBnoECAsQAQ#v=onepage&q=seccomp%20syscall%20eacces&f=false

EACCESS The caller is not allowed to do the syscall


Screen Shot 2020-03-03 at 00 47 43
Screen Shot 2020-03-03 at 00 49 13
DilumAluthge commented 4 years ago

What if the user whitelists statx, but only on the second call to that function?

I'm not sure what you mean here, can you elaborate?

Keno commented 4 years ago

I don't believe the situation you reference in that book is applicable here. That refers to permissions checks that happen before seccomp runs. The eaccess here is most definitely generated by the bpf program that is being loaded.

I'm not sure what you mean here, can you elaborate?

Exactly what I said. One can write a seccomp program that denies every other access to statx. That too would be a buggy seccomp filter (as is this one, because the statx syscall is new, so it's falling back to a path it wasn't supposed to use). I was hoping to illustrate that it would be ridiculous to run statx twice because somebody could run a seccomp filter that denies every other access. I'm saying this situation is analogous.

DilumAluthge commented 4 years ago

I was hoping to illustrate that it would be ridiculous to run statx twice because somebody could run a seccomp filter that denies every other access. I'm saying this situation is analogous.

I see what you mean. But, isn't that already kind of the case with the current implementation in libuv? If you are on a system in which the kernel is too old to have statx, then every call to Base.stat in Julia will result in two calls: a call to statx that returns ENOSYS, and then a call to stat.

I think that's maybe the same thing you said here:

(as is this one, because the statx syscall is new, so it's falling back to a path it wasn't supposed to use)

It looks like libuv added the original fallback here: https://github.com/libuv/libuv/pull/2529

Would you instead be willing to remove the use of statx and always use stat? Then we don't need any fallback whatsoever.

Libuv started using statx in https://github.com/libuv/libuv/pull/2184. It seems the only benefit is that you can get the birth time of a file. It seems like a very small benefit, and the cost is that Julia becomes essentially unusable if statx is blocked by seccomp (you can't do anything that calls ispath, isdir, isfile, etc - so e.g. Pkg becomes unusable).

Keno commented 4 years ago

No, I don't want any workarounds here. Udocker can fix their seccomp program and everything will be awesome. One of the befits of open source is that you don't need to add hacks to workaround buggy third party software :).

StefanKarpinski commented 4 years ago

It seems the only benefit is that you can get the birth time of a file.

It also seems like it should make file operations work better on a variety of distributed/network file systems, which appears to have been a big part of the motivation for introducing statx.

vtjnash commented 4 years ago

I think we could backport https://github.com/libuv/libuv/pull/2529, but it's also just included if we update to the current llvm master.