Open Lucretiel opened 2 years ago
Thanks for the report. It is indeed unsound.
I think this functionality could be implemented without the invalid transmute. Serde doesn't support specialization directly, but it is possible to hack it by abusing newtype serialization with a custom name. This is hack is already used for ExtSerializer
injecting _ExtStruct((_,_))
. So similarly, the library could tell serde to serialize invalid UTF-8 string as _InvalidStringStruct(&[u8])
and have msgpack serializer react to this appropriately.
I don't have time to implement this workaround, and I also don't want to break existing users even if they rely on the bad implementation, so for now I'll deprecate the Raw
/RawRef
newtypes.
Any plan for a fix?
The
unsafe
behavior ofRaw
is trivially shown to be unsound.str
is required to be valid UTF-8, and its methods assume that it contains valid UTF-8 for the purpose of, for example, scanning for code points. I'm able to produce UB by creating a simple Serializer that serializes strings as an array of chars, and then feeding it malformed UTF-8 data:When I run this normally, I get a crash:
And when I run with
cargo miri run
, it immediately detects the undefined behavior:This type should be either:
serialize_bytes
unsafe
and changed to require valid utf-8.To be frank, it's not at all clear to me why this type exists in the first place? It isn't used by RMP anywhere, and I don't really understand why a user would ever be allowed to call
serialize_str
on non-UTF8 data, when the entire point of that method is that valid UTF-8 can be assumed. It appears to exist for no other purpose than to create this kind of soundness hole.My full reproduction code is available as a gist.