briansmith / untrusted

Safe, fast, zero-panic, zero-crashing, zero-allocation parsing of untrusted inputs in Rust.
https://briansmith.org/rustdoc/untrusted/
Other
199 stars 24 forks source link

Clarify what should happen when a `Reader` method returns an error result #21

Open briansmith opened 6 years ago

briansmith commented 6 years ago

Definitely my expectation is that a user of Reader will stop using the reader after a Reader method fails. However, that expectation isn't documented anywhere.

In discussing PR #20 with @oherrala (offline), the point was made that, based on the documentation in the untrusted module, no Reader methods must ever panic even if they are called after a Reader method returned an error.

@oherrala Also suggested that we change things so that, once a Reader method returns an error, then all future calls to all other Reader methods would return an error. I brought up some concerns that this would add an extra check in each method that might hurt performance; we'd need to see how the optimizer copes with this. OTOHO @oherrala brought up the point that most untrusted users probably care much more about safety than absolute performance. Still, it would be useful to try to minimize the performance impact based on the "performance is an aspect of security" mindset. One way to do it would be to set the Reader to the end of the input, and then only check the "previous error" bit when we detect we're at the end of the stream. The more problematic aspect of this idea is that there are methods like mark(), peek(), and skip_to_end() that don't returns a Result. It doesn't make much sense to have this sticky error bit only affect a subset of the methods of the Reader.

Another alternative would be to have every Reader method consume the reader, i.e. take self instead of &mut self as a parameter. Then every method would return Result<(Reader, [current result type]), Error>. We'd need to do some experimenting (e.g. rewrite webpki) to validate such an API design.

briansmith commented 6 years ago

Reader::read_byte() attempts to return one byte and advance the Reader's internal cursor, returning either the byte read or an error indicating that there's no more input. There are some other methods that could be thought of as logically calling read_byte() multiple times and then returning either a bunch of bytes at once or an error indicating that the end of the input was reached. If we think about it that way, then any Reader method that reads, when it fails, could very well set the internal cursor to the end of the input, since that's what would happen if the user attempted to access those same bytes using multiple read_byte() calls.

Still that doesn't seem very satisfying for Reader::at_end(). Are we really at the end of the input because that's the end of a (potentially) valid input, or are we at the end of an input that we know to be too short?

briansmith commented 5 years ago

The best solution for this is probably to change methods that take a &mut self (where self: Reader) to instead take self and return self only on success. For example, a method of type &mut self -> Result<u8, EndOfInput> would change to have signature self -> Result<(self, u8), EndOfInput>.

Of course, that would be a highly disruptive change!

briansmith commented 5 years ago

I prototyped a refactoring of this library where every fallible Reader method was changed from fn(&mut self, ...) -> Result<T, EndOfInput> to fn(self, ...) -> Result<(Self, T), EndOfInput>. This refactoring makes it impossible to use a Reader after it has returned an error. However, I found it to be pretty annoying to use.

I think instead we should investigate changing the semantics to be "whenever a method fails, the Reader is left in the same state as it previously was in." If we do that, we should also consider replacing the use of Result<T, EndOfInput> with Option<T>.

Those changes seem straightforward as far as the implementation of untrusted is concerned. However, I worry that most things built on top of untrusted, will not safely implement those same semantics. More experimentation will be required.