WebAssembly / WASI

WebAssembly System Interface
Other
4.72k stars 240 forks source link

Document that an out-of-bounds pointer shall trap #536

Closed yamt closed 11 months ago

yamt commented 1 year ago

discussion: https://github.com/WebAssembly/WASI/issues/505

sunfishcode commented 1 year ago

Looks good to me. I'll leave this open for a while in case there are any other comments.

pchickey commented 1 year ago

This is a change of behavior for wasi-common but I agree this the correct behavior to implement.

Should we specify the behavior for passing pointers with an incorrect alignment?

Do we need to clarify this spec test to include behaviors for chasing pointers that are out-of-bounds?

Do we need to trap before any other observable side effects, e.g. do we need to validate that all memory referred to in a iovec is in bounds before writing any outputs?

sunfishcode commented 1 year ago

Should we specify the behavior for passing pointers with an incorrect alignment?

The Canonical ABI is currently proposed to trap on misaligned pointers, so yes, I'd say we should ideally do that here too.

Do we need to clarify this spec test to include behaviors for chasing pointers that are out-of-bounds?

Which spec test are you referring to?

Do we need to trap before any other observable side effects, e.g. do we need to validate that all memory referred to in a iovec is in bounds before writing any outputs?

If we're going to trap, we should trap before any observable side effects.

Should we trap for an iovec with any buffer out of bounds? The component model doesn't yet have an iovec, so it doesn't have an opinion here yet. POSIX doesn't appear to say anything. Linux's readv appears to eagerly report an EFAULT if any pointers are invalid (though it appears to ignore NULL pointers). I think we should trap, because there doesn't seem to be any reason to depend on invalid pointers being ignored.

yamt commented 1 year ago

Do we need to trap before any other observable side effects, e.g. do we need to validate that all memory referred to in a iovec is in bounds before writing any outputs?

If we're going to trap, we should trap before any observable side effects.

Should we trap for an iovec with any buffer out of bounds? The component model doesn't yet have an iovec, so it doesn't have an opinion here yet. POSIX doesn't appear to say anything. Linux's readv appears to eagerly report an EFAULT if any pointers are invalid (though it appears to ignore NULL pointers).

i don't think it's a good idea to require eager checks. that's why i wrote "and the function needs to dereference it" in the text.

I think we should trap, because there doesn't seem to be any reason to depend on invalid pointers being ignored.

some implementations naturally ignore unused iovecs eg. on a short read.

pchickey commented 1 year ago

Do we need to clarify this spec test to include behaviors for chasing pointers that are out-of-bounds?

Which spec test are you referring to?

Typo, i meant text.

It sounds like Linux is eager about returning a fault on invalid iovec pointers. @yamt are you aware of other operating systems which are lazier about this behavior, or is it just other wasi preview 1 implementations?

I am in favor of being eager about that fault in order to align with the component model - in preview 2 it will be eager no matter what.

yamt commented 1 year ago

@yamt are you aware of other operating systems which are lazier about this behavior, or is it just other wasi preview 1 implementations?

eg. netbsd

and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)

sunfishcode commented 1 year ago

eg. netbsd

and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)

Oh, interesting. If I modify your testcase to use -1 instead of 1 as the invalid address, I do get an EFAULT on Linux:

With (void *)1:

readv(3, [{iov_base="X", iov_len=1}, {iov_base="", iov_len=1}], 2) = 1

With (void *)-1):

readv(3, [{iov_base=0x7ffe8fe44817, iov_len=1}, {iov_base=0xffffffffffffffff, iov_len=1}], 2) = -1 EFAULT (Bad address)

So I don't know what logic Linux is using to decide what gets an EFAULT, but whatever it's doing, it's doing it eagerly.

I think WASI should be eager here.

yamt commented 1 year ago

eg. netbsd

and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)

Oh, interesting. If I modify your testcase to use -1 instead of 1 as the invalid address, I do get an EFAULT on Linux:

With (void *)1:

readv(3, [{iov_base="X", iov_len=1}, {iov_base="", iov_len=1}], 2) = 1

With (void *)-1):

readv(3, [{iov_base=0x7ffe8fe44817, iov_len=1}, {iov_base=0xffffffffffffffff, iov_len=1}], 2) = -1 EFAULT (Bad address)

So I don't know what logic Linux is using to decide what gets an EFAULT, but whatever it's doing, it's doing it eagerly.

interesting.

I think WASI should be eager here.

* It's a bug in portable code if it's passing invalid pointers to `readv`, even if some OS's in some situations allow it to work. And it's not a harmless bug. NULL is a dereferenceble address in wasm, so if applications are relying on passing in NULL and having it be ignored, wasm is already breaking that. And if they're passing in a dangling address, then the only thing between them and memory corruption is luck. Eager checking protects programs from both of these hazards.

* Lazy checking here wouldn't save engines the need to walk the iovec list eagerly. They'll need to do that anyway, to translate wasm addresses to host OS addresses before the `readv` call.

i'm not sure about this "lazy checking wouldn't save" point.

sunfishcode commented 1 year ago

eg. netbsd

and as far as i tested with the following test code, linux doesn't seem eager either. https://github.com/yamt/garbage/blob/master/c/iov_badaddr/a.c (or probably i don't understand what @sunfishcode meant by "eagerly report an EFAULT if any pointers are invalid".)

Oh, interesting. If I modify your testcase to use -1 instead of 1 as the invalid address, I do get an EFAULT on Linux: With (void *)1:

readv(3, [{iov_base="X", iov_len=1}, {iov_base="", iov_len=1}], 2) = 1

With (void *)-1):

readv(3, [{iov_base=0x7ffe8fe44817, iov_len=1}, {iov_base=0xffffffffffffffff, iov_len=1}], 2) = -1 EFAULT (Bad address)

So I don't know what logic Linux is using to decide what gets an EFAULT, but whatever it's doing, it's doing it eagerly.

interesting.

I think WASI should be eager here.

* It's a bug in portable code if it's passing invalid pointers to `readv`, even if some OS's in some situations allow it to work. And it's not a harmless bug. NULL is a dereferenceble address in wasm, so if applications are relying on passing in NULL and having it be ignored, wasm is already breaking that. And if they're passing in a dangling address, then the only thing between them and memory corruption is luck. Eager checking protects programs from both of these hazards.

* Lazy checking here wouldn't save engines the need to walk the iovec list eagerly. They'll need to do that anyway, to translate wasm addresses to host OS addresses before the `readv` call.

i'm not sure about this "lazy checking wouldn't save" point.

* wasi is not necessarily implemented as host functions.

* even when it is, `fd_read` is not necessarily implemented with a host `readv` call.

That's a good point. Some implementations could have some extra cost to this. Fortunately, MAX_IOV on many platforms is only 1024, and in my experience the number of buffers is often much less than that, so the cost shouldn't be high in practice. And when people get to the point of writing such implementations and benchmarking them, if the bounds checks turn out to be significant, we can add new interfaces that don't require them.

So I'd still say that the security and portable angles justify eager trapping here, especially this time. It will catch some cases where programs might be implicitly depending on readv implementations and data sources reading under a certain number of bytes in order to avoid hitting invalid buffers.

pchickey commented 11 months ago

This has been open for a while (unfortuantely I lost track of it) and we haven't seen any objections, so I am merging