BurntSushi / bstr

A string type for Rust that is not required to be valid UTF-8.
Other
745 stars 51 forks source link

remove `Borrow<BStr> for String` impls (and similar) in a semver compatible release #168

Open Xuanwo opened 7 months ago

Xuanwo commented 7 months ago

Given the following code:

use bstr;

#[test]
fn test_bstr() {
    let mut m = HashMap::new();
    m.insert("hello".to_owned(), 42i8);
    assert!(m.get(bstr::BStr::new("hello")).copied() == Some(42));
}

We expect it to pass, but failed. The root cause is: the hash for identical content differs between a String and bytes.

Considering Borrow's requirement:

In particular Eq, Ord and Hash must be equivalent for borrowed and owned values: x.borrow() == y.borrow() should give the same result as x == y.

Is it safe to implement Borrow<BStr> for String?

BurntSushi commented 7 months ago

Is it safe to implement Borrow<BStr> for String?

It is absolutely safe to do so, yes. Is it correct to do so? Maybe. It certainly appears to be the case that the current configuration is not correct though.

I wonder what the right thing to do here is. I see a few different paths:

  1. Remove the Borrow impls and release bstr 2.0.
  2. Mark this as wontfix.
  3. If I'm correct in understanding the issue here, perhaps we can change the Hash implementation of BStr to match that of str? Is that possible?

If we decide on path (1), then to be clear, it won't happen any time soon. I don't plan on putting out a bstr 2.0 release for at least some number of years. Possibly never if no serious issues arise. With that said, this does strike me as very unfortunate, and so I'm hoping there is a way to fix this in a compatible manner.

thomcc commented 7 months ago

3 is possible for sure. Something like this would match the Hash impl for str.

impl core::hash::Hash for BStr {
    fn hash<H: core::hash::Hasher>(&self, h: &mut H) {
        h.write(self.as_bytes());
        h.write_u8(0xff);
    }
}

Unfortunately this would still be different from the Hash impl for [u8]. IOW, I don't think it's possible to have a correct Borrow impl for both [u8] and str, although I had never considered it before.

BurntSushi commented 7 months ago

Yeah, indeed, we have Borrow impls for both [u8] and str. This seems like quite an unfortunate mistake on my part.

BurntSushi commented 7 months ago

@Xuanwo For you, I think the work-around is to define a newtype with the semantics you need. (And this is likely what The Real Solution would be anyway, since it seems like Borrow<BStr> for String was not correct to add.)

XeCycle commented 7 months ago

A newtype sounds correct within the current HashMap interface, but I'm afraid that how str is hashed in std may be an implementation detail. I'm not really confident with adding it myself, only to find that std changed impl Hash for str years later; however I'd be more confident if we add it in bstr, maybe BStr::new("hello").hash_as_str(), because bstr is reasonably popular and (I guess) should be part of crater tests. No idea whether the libs team feel good at being locked, though...

XeCycle commented 7 months ago

Another thought: can we declare String: Borrow<BStr> as broken, as a bug, and simply drop it without bumping version? Apparently indexing a HashMap<String, _> with BStr is wrong today, so potential users should have been using some workaround.

However I agree that dropping it is technically a breaking change, so we can't be more careful on this...

BurntSushi commented 7 months ago

@XeCycle That is perhaps defensible given that it's incorrect and leads to incorrect results in practice. I think it's likely that's the route I'll go. We don't have crater for crates though, so it's hard to know the impact. One thing I can perhaps do is issue a release with a deprecation warning that it's going to be removed so that folks not following the issue tracker have a chance to act before the breakage is upon them.

Xuanwo commented 7 months ago

It's quite hard to find use cases as described. I searched for get(BStr::new(, but only found one instance: https://github.com/search?q=%22get%28BStr%3A%3Anew%22&type=code

Besides, there are 243 reverse dependencies listed at https://crates.io/crates/bstr/reverse_dependencies. I'm considering writing some scripts to patch bstr (by removing Borrow<BStr>) and test whether they can successfully build.

BurntSushi commented 7 months ago

If you want to do that, that would be most appreciated. Although regardless of the result of that, I'm likely to at least push this through a deprecation period to give folks time to migrate away from the broken API.