ParkMyCar / compact_str

A memory efficient string type that can store up to 24* bytes on the stack
MIT License
638 stars 46 forks source link

Remove invalid UTF8 check from `Repr::from_utf8_unchecked` #379

Closed overlookmotel closed 1 month ago

overlookmotel commented 4 months ago

This PR removes the following code from Repr::from_utf8_unchecked:

https://github.com/ParkMyCar/compact_str/blob/d4798639c24cacf528770d83e8842336b79fbc39/compact_str/src/repr/mod.rs#L126-L137

In my opinion, this code should be removed for 2 reasons:

  1. It is part of the safety contract for from_utf8_unchecked that buf is the bytes of a valid UTF8 string. This check penalizes users who follow the safety contract (which everyone should), in order to support users who don't (and therefore have opted in to UB).
  2. This check alone is not enough to avoid UB anyway. If buf ends with the first byte of a multi-byte Unicode character, for example calling .as_str().chars().collect::<Vec<char>>() on the resulting CompactString will cause an out of bounds read.
overlookmotel commented 4 months ago

CI fails are I believe caused by tests/fuzz which are breaking the safety constraint of utf8_unchecked. If intention is to merge this, those tests should be removed/adapted.

I've fixed the proptest, but I don't know how to alter the fuzzer.

ParkMyCar commented 1 month ago

Sorry for the delay here @overlookmotel, and thank you for the PR!

This is a very good point, I would like to add more documentation explaining what might occur if the user does not provide valid UTF-8. This PR has sat open for long enough though, I'll merge as-is and add the docs myself :)

overlookmotel commented 1 month ago

No worries at all. Open source! Everyone is busy... appreciate you merging.