Closed 5225225 closed 1 year ago
Yeah, it's legal to encode a 1 like this. I don't think it's a good idea, but it is not forbidden. Also i don't want to have the "extra" complexity of adding one every loop.
But yeah, a clarification for this would be a good part of the spec.
Is it legal to encode
1
as0x81, 0x80, 0x80, 0x00
? Nothing in the spec forbids it, but it's just not mentioned. The example code would decode that just fine.On the other hand, the spec says the number is always between 1 and 5 bytes, so any compliant decoder must forbid
0x80, 0x80, 0x80, 0x80, 0x80, 0x00
, right? That can also be seen in the notes on the 5th byte saying the upper 4 bits of that byte must be unset, so any decoder must reject that, right? That test vector should be in the docs.Git's varint encoding looks similar (but slightly more complex), and they apparently have no redundant states and is shorter on average because of it. How wise of an idea it would be to make such a change, no idea.
https://github.com/git/git/blob/7fb6aefd2aaffe66e614f7f7b83e5b7ab16d4806/varint.c#L4-L18
Regardless of what is chosen, it would be good to include an example of it in the docs, either showing that an overlong (but 5 bytes or shorter) encoding is forbidden, or mandatory to support.