Open chmp opened 2 hours ago
Sorry, I'm not understanding the problem. Does this lead to an actual issue in your code? Like it makes sense that these would be different at a conceptual level. Serialization is guaranteed to produce valid UTF-8, so it serializes a string. But deserialization can be done on a &[u8]
, which may not be valid UTF-8.
Not really an issue: I simply need to add deserialize_bytes
impls that hand the bytes of the string to to the visitor.
I was merely surprised: so far I only encountered types that use deserialize_str
in their Deserialize
impl, when using collect_str
in their Serialize
impl. I always interpreted the deserialize_*
methods used to indicate the "natural" type of the value and I expected that to be the type serialized to (string in this case).
Do you agree that it makes sense given that parsing works on &[u8]
and therefore, deserialization accepting either &[u8]
or &str
is maximally flexible? Like, if the Deserialize
impl didn't accept &[u8]
, then you couldn't deserialize data types from bytes. It would have to be string.
Now if you're layering stuff on top of this where only strings make sense, then it should be fine if you just use deserialize_str
, right? It won't be maximally flexible, but maybe it doesn't need to be. It depends on your use case.
(It's possible I have a misunderstanding here.)
Oh sorry. I think I did not explain myself clearly. I completely agree that accepting both &[u8]
and &str
is a good idea.
The only thing that would change would be:
impl<'de> serde::Deserialize<'de> for Zoned {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Zoned, D::Error> {
use serde::de;
struct ZonedVisitor;
// ...
- deserializer.deserialize_bytes(ZonedVisitor)
+ deserializer.deserialize_str(ZonedVisitor)
}
}
E.g., in serde_arrow
I use the deserializer.deserialize_*
call to detect the "natural" / "preferred" type of a value. serde_reflect
is doing the same. The deserializer
is still free to hand your visitor a &[u8]
.
As I said most types I encountered were symmetric in the Serde type they serialized to and the type they requested from the deserializer. Not sure whether this is a general good practice in the implementation of the Serde traits. Hence, I wanted to give you a quick heads up.
You're saying that if I used deserialize_str
instead of deserialize_bytes
, then it could still accept bytes? Maybe I have a misunderstanding of Serde itself. What is the actual behavioral difference between deserialize_str
and deserialize_bytes
?
Puh. I have to admit the Serde API with its asymmetry between Serialize and Deserialize is still somewhat confusing. Re. this issue:
The deserialize_* methods are hints to the deserializer for how to interpret the next piece of input [...] Note that a Deserializer will not necessarily follow the type hint, so the call to deserialize_i32 does not necessarily mean the Deserializer will call I32Visitor::visit_i32
(source).
So in this case, the fact that you are calling deserialize_bytes
does not all guarantee that you get &[u8]
, in principle the deserializer could also hand you back a i64
.
I always interpreted the call to deserialize_*
to indicate the "preferred" type.
Interesting. I'll leave this open for now since it seems deserving of a deeper dive on my part. The visitors in this crate do also accept both strings and bytes.
I agree that a string is, I guess, the "preferred" type in this context.
I guess, this change would also be a visible, hence breaking, one. So not sure it's worth the effort.
Well I guess that gets to my question: what is the actual behavior difference?
In either case, my plan with Jiff is to do 1-2 breaking changes leading up to Summer 2025, and then publish Jiff 1.0. So if it's really a breaking change, that's fine, it can be included in jiff 0.2
.
jiff
uses inconsistent Serde types in serialization / deserialization.jiff
typesserializer.collect_str()
) (example)deserializer.deserialize_bytes(visitor)
) (example)It's not a huge issue, but I wanted to give a heads up: this inconsistency tripped me up in a couple of places, while adding support for
jiff
to my crate. I am now making my own impl more robust, by adding adeserialize_bytes
impl to everydeserialize_str
impl :)