BurntSushi / bstr

A string type for Rust that is not required to be valid UTF-8.
Other
745 stars 51 forks source link

Clarify None case in bstr::decode_utf8 #139

Open glts opened 1 year ago

glts commented 1 year ago

Thank you for this useful library.

In bstr 1.0.1, the documentation for bstr::decode_utf8 states:

When unsuccessful, None is returned along with the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence. In this case, the number of bytes consumed is always between 0 and 3, inclusive, where 0 is only returned when slice is empty.

bstr::decode_utf8(b"\xFFabc") returns (None, 1). The byte \xFF cannot be decoded so the result is None; but the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence would be 0, as \xFF is not a valid UTF-8 prefix.

Can you confirm, or can you paraphrase the wording for me?

BurntSushi commented 1 year ago

Ah. 1 is indeed correct. The docs need to be updated. Returning 0 wouldn't make sense, because 0 is meant to be the terminal condition of a loop. Returning 0 in any other case leads to more complex loop logic that would be easy to get wrong, which would lead to an infinite loop in practice.