dermesser / integer-encoding-rs

Integer encoding for primitive integer types: Supports varint/varint+zigzag and fixed-length integer encoding and decoding, and provides synchronous and asynchronous Write/Read types for easily writing/reading integers.
Other
66 stars 16 forks source link

Silent downcast overflow #21

Closed crepererum closed 2 years ago

crepererum commented 2 years ago

When using VarintReader with types smaller than 64bits, the current implementation first reads a 64bit integer:

https://github.com/dermesser/integer-encoding-rs/blob/14cd007c6c586e8d87c2a8762bd56c9928ae3fcf/src/reader.rs#L40-L58

and then casts it to a smaller integer:

https://github.com/dermesser/integer-encoding-rs/blob/14cd007c6c586e8d87c2a8762bd56c9928ae3fcf/src/varint.rs#L74-L77

https://github.com/dermesser/integer-encoding-rs/blob/14cd007c6c586e8d87c2a8762bd56c9928ae3fcf/src/varint.rs#L90-L93

Now imagine a data stream w/ 9 bits 0xff followed by 0x00, which would be a pretty large 64bit integer. If you use VarIntReader::read_varint::<i32>(...) to decode that, it will sucessfully read the 64bit varint, consume all the bytes and during the conversion to 32bit will silently truncate the result. What it should do instead is to read only the number of bytes that are at max required for 32bit (5?) and then fail with "Unterminated varint".

dermesser commented 2 years ago

Yes this is an unsatisfactory state. However, with an unterminated varint error, there would be a fragment being left in the input stream which will give a nonsense number on next read (at least as nonsensical as a truncated int).

Now one could write documentation that the next integer after such an error must be ignored... but that also doesn't seem optimal to me.

In any case, I'm open for debate on this topic.

crepererum commented 2 years ago

I think if your read / deserialize data from a stream and get an error, you have to stop reading or re-synchronize the stream (the latter one is mostly only possible for low-level network protocols). I think a user cannot expect a parser to automatically recover from broken data, because there is no way to know which part was broken (in the example case above was it one byte that was broken or was a 32bit value serialized as 64bit? or did we mess up before reading the varint?). It's not only about the "next integer" btw. because likely the protocol the user is trying to deserialize consists of other data types before and after the varint in question.

With your argument, I think you should NEVER return an "unterminated varint error" because you could always argue that there could in theory be a 128bit, 256bit, ... wide varint.

dermesser commented 2 years ago

That's a good point, I will look into it.

dermesser commented 2 years ago

@crepererum would you mind reviewing the change in #22?

dermesser commented 2 years ago

I believe that this issue can be closed with the fix implemented.