Closed yamt closed 11 months ago
Looks good to me. I'll leave this open for a while in case there are any other comments.
This is a change of behavior for wasi-common
but I agree this the correct behavior to implement.
Should we specify the behavior for passing pointers with an incorrect alignment?
Do we need to clarify this spec test to include behaviors for chasing pointers that are out-of-bounds?
Do we need to trap before any other observable side effects, e.g. do we need to validate that all memory referred to in a iovec is in bounds before writing any outputs?
Should we specify the behavior for passing pointers with an incorrect alignment?
The Canonical ABI is currently proposed to trap on misaligned pointers, so yes, I'd say we should ideally do that here too.
Do we need to clarify this spec test to include behaviors for chasing pointers that are out-of-bounds?
Which spec test are you referring to?
Do we need to trap before any other observable side effects, e.g. do we need to validate that all memory referred to in a iovec is in bounds before writing any outputs?
If we're going to trap, we should trap before any observable side effects.
Should we trap for an iovec with any buffer out of bounds? The component model doesn't yet have an iovec, so it doesn't have an opinion here yet. POSIX doesn't appear to say anything. Linux's readv
appears to eagerly report an EFAULT
if any pointers are invalid (though it appears to ignore NULL pointers). I think we should trap, because there doesn't seem to be any reason to depend on invalid pointers being ignored.
Do we need to trap before any other observable side effects, e.g. do we need to validate that all memory referred to in a iovec is in bounds before writing any outputs?
If we're going to trap, we should trap before any observable side effects.
Should we trap for an iovec with any buffer out of bounds? The component model doesn't yet have an iovec, so it doesn't have an opinion here yet. POSIX doesn't appear to say anything. Linux's
readv
appears to eagerly report anEFAULT
if any pointers are invalid (though it appears to ignore NULL pointers).
i don't think it's a good idea to require eager checks. that's why i wrote "and the function needs to dereference it" in the text.
I think we should trap, because there doesn't seem to be any reason to depend on invalid pointers being ignored.
some implementations naturally ignore unused iovecs eg. on a short read.
Do we need to clarify this spec test to include behaviors for chasing pointers that are out-of-bounds?
Which spec test are you referring to?
Typo, i meant text
.
It sounds like Linux is eager about returning a fault on invalid iovec pointers. @yamt are you aware of other operating systems which are lazier about this behavior, or is it just other wasi preview 1 implementations?
I am in favor of being eager about that fault in order to align with the component model - in preview 2 it will be eager no matter what.
@yamt are you aware of other operating systems which are lazier about this behavior, or is it just other wasi preview 1 implementations?
eg. netbsd
and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)
eg. netbsd
and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)
Oh, interesting. If I modify your testcase to use -1
instead of 1
as the invalid address, I do get an EFAULT
on Linux:
With (void *)1
:
readv(3, [{iov_base="X", iov_len=1}, {iov_base="", iov_len=1}], 2) = 1
With (void *)-1)
:
readv(3, [{iov_base=0x7ffe8fe44817, iov_len=1}, {iov_base=0xffffffffffffffff, iov_len=1}], 2) = -1 EFAULT (Bad address)
So I don't know what logic Linux is using to decide what gets an EFAULT
, but whatever it's doing, it's doing it eagerly.
I think WASI should be eager here.
It's a bug in portable code if it's passing invalid pointers to readv
, even if some OS's in some situations allow it to work. And it's not a harmless bug. NULL is a dereferenceble address in wasm, so if applications are relying on passing in NULL and having it be ignored, wasm is already breaking that. And if they're passing in a dangling address, then the only thing between them and memory corruption is luck. Eager checking protects programs from both of these hazards.
Lazy checking here wouldn't save engines the need to walk the iovec list eagerly. They'll need to do that anyway, to translate wasm addresses to host OS addresses before the readv
call.
eg. netbsd
and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)
Oh, interesting. If I modify your testcase to use
-1
instead of1
as the invalid address, I do get anEFAULT
on Linux:With
(void *)1
:readv(3, [{iov_base="X", iov_len=1}, {iov_base="", iov_len=1}], 2) = 1
With
(void *)-1)
:readv(3, [{iov_base=0x7ffe8fe44817, iov_len=1}, {iov_base=0xffffffffffffffff, iov_len=1}], 2) = -1 EFAULT (Bad address)
So I don't know what logic Linux is using to decide what gets an
EFAULT
, but whatever it's doing, it's doing it eagerly.
interesting.
I think WASI should be eager here.
* It's a bug in portable code if it's passing invalid pointers to `readv`, even if some OS's in some situations allow it to work. And it's not a harmless bug. NULL is a dereferenceble address in wasm, so if applications are relying on passing in NULL and having it be ignored, wasm is already breaking that. And if they're passing in a dangling address, then the only thing between them and memory corruption is luck. Eager checking protects programs from both of these hazards. * Lazy checking here wouldn't save engines the need to walk the iovec list eagerly. They'll need to do that anyway, to translate wasm addresses to host OS addresses before the `readv` call.
i'm not sure about this "lazy checking wouldn't save" point.
fd_read
is not necessarily implemented with a host readv
call.eg. netbsd
and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)
Oh, interesting. If I modify your testcase to use
-1
instead of1
as the invalid address, I do get anEFAULT
on Linux: With(void *)1
:readv(3, [{iov_base="X", iov_len=1}, {iov_base="", iov_len=1}], 2) = 1
With
(void *)-1)
:readv(3, [{iov_base=0x7ffe8fe44817, iov_len=1}, {iov_base=0xffffffffffffffff, iov_len=1}], 2) = -1 EFAULT (Bad address)
So I don't know what logic Linux is using to decide what gets an
EFAULT
, but whatever it's doing, it's doing it eagerly.interesting.
I think WASI should be eager here.
* It's a bug in portable code if it's passing invalid pointers to `readv`, even if some OS's in some situations allow it to work. And it's not a harmless bug. NULL is a dereferenceble address in wasm, so if applications are relying on passing in NULL and having it be ignored, wasm is already breaking that. And if they're passing in a dangling address, then the only thing between them and memory corruption is luck. Eager checking protects programs from both of these hazards. * Lazy checking here wouldn't save engines the need to walk the iovec list eagerly. They'll need to do that anyway, to translate wasm addresses to host OS addresses before the `readv` call.
i'm not sure about this "lazy checking wouldn't save" point.
* wasi is not necessarily implemented as host functions. * even when it is, `fd_read` is not necessarily implemented with a host `readv` call.
That's a good point. Some implementations could have some extra cost to this. Fortunately, MAX_IOV
on many platforms is only 1024, and in my experience the number of buffers is often much less than that, so the cost shouldn't be high in practice. And when people get to the point of writing such implementations and benchmarking them, if the bounds checks turn out to be significant, we can add new interfaces that don't require them.
So I'd still say that the security and portable angles justify eager trapping here, especially this time. It will catch some cases where programs might be implicitly depending on readv
implementations and data sources reading under a certain number of bytes in order to avoid hitting invalid buffers.
This has been open for a while (unfortuantely I lost track of it) and we haven't seen any objections, so I am merging
discussion: https://github.com/WebAssembly/WASI/issues/505