Truncating behavior is confusing and forces allocations

Nullus157 / bs58-rs

Another Rust Base58 codec implementation

Apache License 2.0

75 stars 24 forks source link

Truncating behavior is confusing and forces allocations #77

Closed matklad closed 3 years ago

matklad commented 3 years ago

I expect the following test to pass:

#[test]
fn append() {
    let mut buf = "hello world".to_string();
    bs58::encode(&[92]).into(&mut buf).unwrap();
    assert_eq!("hello world2b", buf.as_str());
}

Instead, it fails, as the buf contains just "2b". That is, encoding discards existing data, rather than appending to it.

There are two problems with it:

it is surprising behavior. Standard library APIs like read_line always append. If overwriting is desired, the caller can call .clear()
it forces can force an allocation, if the user actually wants to append data to some existing buffer. This comes up when, for exmple, using sri-encoding hashes: "<algo-name>-<base58 encoded bytes>".

Nemo157 commented 3 years ago

Agreed. Should be a pretty easy change and I think worth a breaking release.

Nemo157 commented 3 years ago

Would you expect the same when decoding into a &mut Vec<u8> (vs &mut [u8])?

#[test]
fn append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(&mut buf).unwrap();
    assert_eq!(b"hello world!", buf.as_ref());
}

#[test]
fn no_append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(buf.as_mut()).unwrap();
    assert_eq!(b"!ello world", buf.as_ref());
}

matklad commented 3 years ago

For Vec<u8>, I'd expect expect the same behavior as for String -- append the end.

For &mut [u8], I'd expect the same behavior as char::encode_utf8 -- overwrite the prefix, return the str slice of the data actually written.

Nemo157 commented 3 years ago

Returning an &str would require checking/asserting utf-8 validity at that point, if you're doing more ASCII-only processing on the buffer (or never actually asserting it is a string) then you might want to delay that.

matklad commented 3 years ago

Hm, I think base58 guarantees that the encoded result is utf8, so no additional validation is necessary? If this assumption is correct, that returning &mut str allows the calling code to avoid utf8-validation and bounds checking. In any case, returning just usize signifying the amount of bytes written would be fine as well. Maybe retuning usize is even better: I wager that the main benefit for char's return type is not actaully eliding the check, but just basic conveniecne for cases where you'd want to encode char to a local [u8; 4], and then do something with the resulting string.

Nemo157 commented 3 years ago

Yeah, it wouldn't need validating since the API guarantees it's ASCII, but I want to minimize the unsafe code here (currently the only unsafe code used is the bare minimum necessary to actually work with &mut str).