Open briansmith opened 6 years ago
Reader::read_byte()
attempts to return one byte and advance the Reader
's internal cursor, returning either the byte read or an error indicating that there's no more input. There are some other methods that could be thought of as logically calling read_byte()
multiple times and then returning either a bunch of bytes at once or an error indicating that the end of the input was reached. If we think about it that way, then any Reader
method that reads, when it fails, could very well set the internal cursor to the end of the input, since that's what would happen if the user attempted to access those same bytes using multiple read_byte()
calls.
Still that doesn't seem very satisfying for Reader::at_end()
. Are we really at the end of the input because that's the end of a (potentially) valid input, or are we at the end of an input that we know to be too short?
The best solution for this is probably to change methods that take a &mut self
(where self: Reader
) to instead take self
and return self
only on success. For example, a method of type &mut self -> Result<u8, EndOfInput>
would change to have signature self -> Result<(self, u8), EndOfInput>
.
Of course, that would be a highly disruptive change!
I prototyped a refactoring of this library where every fallible Reader
method was changed from fn(&mut self, ...) -> Result<T, EndOfInput>
to fn(self, ...) -> Result<(Self, T), EndOfInput>
. This refactoring makes it impossible to use a Reader
after it has returned an error. However, I found it to be pretty annoying to use.
I think instead we should investigate changing the semantics to be "whenever a method fails, the Reader is left in the same state as it previously was in." If we do that, we should also consider replacing the use of Result<T, EndOfInput>
with Option<T>
.
Those changes seem straightforward as far as the implementation of untrusted
is concerned. However, I worry that most things built on top of untrusted
, will not safely implement those same semantics. More experimentation will be required.
Definitely my expectation is that a user of
Reader
will stop using the reader after aReader
method fails. However, that expectation isn't documented anywhere.In discussing PR #20 with @oherrala (offline), the point was made that, based on the documentation in the
untrusted
module, noReader
methods must ever panic even if they are called after aReader
method returned an error.@oherrala Also suggested that we change things so that, once a Reader method returns an error, then all future calls to all other
Reader
methods would return an error. I brought up some concerns that this would add an extra check in each method that might hurt performance; we'd need to see how the optimizer copes with this. OTOHO @oherrala brought up the point that mostuntrusted
users probably care much more about safety than absolute performance. Still, it would be useful to try to minimize the performance impact based on the "performance is an aspect of security" mindset. One way to do it would be to set the Reader to the end of the input, and then only check the "previous error" bit when we detect we're at the end of the stream. The more problematic aspect of this idea is that there are methods likemark()
,peek()
, andskip_to_end()
that don't returns aResult
. It doesn't make much sense to have this sticky error bit only affect a subset of the methods of the Reader.Another alternative would be to have every
Reader
method consume the reader, i.e. takeself
instead of&mut self
as a parameter. Then every method would returnResult<(Reader, [current result type]), Error>
. We'd need to do some experimenting (e.g. rewritewebpki
) to validate such an API design.