Open lucatrv opened 11 months ago
If this is possible to do in a small patch, like adding a new invalid_result
free function, then I'd be happy to take a PR.
I could create a PR when I have some time... do you see any other need apart from adding the invalid_result
function and document it?
@BurntSushi, for now I implemented the following function:
pub fn invalid_result<'de, D, T>(
de: D,
) -> result::Result<std::result::Result<T, String>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
Ok(T::deserialize(de).or_else(|err| Err(err.to_string())))
}
This works but reports the error message, not the original invalid string value as it should be. So for instance for good values, empty strings, or invalid strings, I am getting respectively:
Record { latitude: 60.5544444, longitude: -151.2583333, population: Ok(7610), city: "Kenai", state: "AK" }
Record { latitude: 65.2419444, longitude: -165.2716667, population: Err("field 2: cannot parse integer from empty string"), city: "Davidsons Landing", state: "AK" }
Record { latitude: 37.3433333, longitude: -86.7136111, population: Err("field 2: invalid digit found in string"), city: "Flint Springs", state: "KY" }
Do you know how to get the original invalid string value from within the function?
@BurntSushi, sorry to bother you but I've been trying to implement this for a while without success, and I can't really figure out how to solve this. I'm now wondering if it is feasible at all... should the Error
type be extended to report the original csv string? If you can give me some hints on how to modify the function that I reported above (if feasible) then I will complete my PR and issue it. Thanks
@BurntSushi I finally got two working functions. The first one is simpler and relies on Content.as_str()
for String
conversion, thus it can fail. In the second function instead I manually managed each Content
variant, applying lossy string conversion when appropriate, so it never fails. However I'm not sure if I managed correctly the None
/ Some
/ Unit
/ Newtype
/ Seq
/ Map
variants.
pub fn invalid_result<'de, D, T>(
de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
// Need to use `Content` and `ContentRefDeserializer`, which currently are in
// Serde private API but may become public in the future, see:
// https://github.com/serde-rs/serde/issues/741
// https://github.com/arcnmx/serde-value/issues/16
let content = serde::__private::de::Content::deserialize(de)?;
let de = serde::__private::de::ContentRefDeserializer::<D::Error>::new(
&content,
);
if let Ok(t) = T::deserialize(de) {
Ok(Ok(t))
} else if let Some(s) = content.as_str() {
Ok(Err(s.to_owned()))
} else {
Err(serde::de::Error::custom("input data is not a valid UTF-8 string"))
}
}
pub fn invalid_result<'de, D, T>(
de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
// Need to use `Content` and `ContentRefDeserializer`, which currently are in
// Serde private API but may become public in the future, see:
// https://github.com/serde-rs/serde/issues/741
// https://github.com/arcnmx/serde-value/issues/16
let content = serde::__private::de::Content::deserialize(de)?;
let de = serde::__private::de::ContentRefDeserializer::<D::Error>::new(
&content,
);
let result = if let Ok(t) = T::deserialize(de) {
Ok(t)
} else {
Err(match content {
serde::__private::de::Content::Bool(b) => b.to_string(),
serde::__private::de::Content::U8(u) => u.to_string(),
serde::__private::de::Content::U16(u) => u.to_string(),
serde::__private::de::Content::U32(u) => u.to_string(),
serde::__private::de::Content::U64(u) => u.to_string(),
serde::__private::de::Content::I8(i) => i.to_string(),
serde::__private::de::Content::I16(i) => i.to_string(),
serde::__private::de::Content::I32(i) => i.to_string(),
serde::__private::de::Content::I64(i) => i.to_string(),
serde::__private::de::Content::F32(f) => f.to_string(),
serde::__private::de::Content::F64(f) => f.to_string(),
serde::__private::de::Content::Char(c) => c.to_string(),
serde::__private::de::Content::String(s) => s,
serde::__private::de::Content::Str(s) => s.to_owned(),
serde::__private::de::Content::ByteBuf(buf) => {
String::from_utf8_lossy(&buf).into_owned()
}
serde::__private::de::Content::Bytes(bytes) => {
String::from_utf8_lossy(bytes).into_owned()
}
serde::__private::de::Content::None => String::new(),
serde::__private::de::Content::Some(some) => {
format!("{:?}", some)
}
serde::__private::de::Content::Unit => String::new(),
serde::__private::de::Content::Newtype(newtype) => {
format!("{:?}", newtype)
}
serde::__private::de::Content::Seq(seq) => format!("{:?}", seq),
serde::__private::de::Content::Map(m) => format!("{:?}", m),
})
};
Ok(result)
}
I prefer the second function, but please let me know your preference and advice, or otherwise if I should use serde-value
as reported in the following docstring. After I get your answer I can create a PR.
/// Used from generated code to buffer the contents of the Deserializer when
/// deserializing untagged enums and internally tagged enums.
///
/// Not public API. Use serde-value instead.
#[derive(Debug, Clone)]
pub enum Content<'de> {
@BurntSushi, I think I found a better implementation. I hope you like it, the only drawback is that it requires Serde's derive
feature, but I cannot find other valid solutions (either relying on Serde's private API, or on serde-value
, or on Serde's derive
feature). I am going to create a PR with this implementation.
pub fn invalid_result<'de, D, T, F>(
de: D,
) -> result::Result<result::Result<T, F>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
F: Deserialize<'de>,
{
#[derive(Deserialize)]
#[serde(untagged)]
enum DeFallback<T, F> {
Expected(T),
Fallback(F),
}
DeFallback::<T, F>::deserialize(de).map(|d| match d {
DeFallback::Expected(t) => Ok(t),
DeFallback::Fallback(f) => Err(f),
})
}
Never mind, the above code using Serde's derive
feature does not work as expected. For instance if a column contains integers, strings and floats, and one tries to deserialize it to Result<i64, String>
to retain all invalid values (floats and strings) as String
s, the deserialization would error out at the first float.
So at the moment the only two options that I could find are either relying on Serde's private API or on serde-value
:
pub fn invalid_result_private<'de, D, T>(
de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
let content = serde::__private::de::Content::deserialize(de)?;
let de = serde::__private::de::ContentRefDeserializer::<D::Error>::new(
&content,
);
let result = T::deserialize(de).map_err(|_| match content {
serde::__private::de::Content::Bool(b) => b.to_string(),
serde::__private::de::Content::U8(u) => u.to_string(),
serde::__private::de::Content::U16(u) => u.to_string(),
serde::__private::de::Content::U32(u) => u.to_string(),
serde::__private::de::Content::U64(u) => u.to_string(),
serde::__private::de::Content::I8(i) => i.to_string(),
serde::__private::de::Content::I16(i) => i.to_string(),
serde::__private::de::Content::I32(i) => i.to_string(),
serde::__private::de::Content::I64(i) => i.to_string(),
serde::__private::de::Content::F32(f) => f.to_string(),
serde::__private::de::Content::F64(f) => f.to_string(),
serde::__private::de::Content::Char(c) => c.to_string(),
serde::__private::de::Content::String(s) => s,
serde::__private::de::Content::Str(s) => s.to_owned(),
serde::__private::de::Content::ByteBuf(buf) => {
String::from_utf8_lossy(&buf).into_owned()
}
serde::__private::de::Content::Bytes(bytes) => {
String::from_utf8_lossy(bytes).into_owned()
}
serde::__private::de::Content::None => String::new(),
serde::__private::de::Content::Some(some) => {
format!("{:?}", some)
}
serde::__private::de::Content::Unit => String::new(),
serde::__private::de::Content::Newtype(newtype) => {
format!("{:?}", newtype)
}
serde::__private::de::Content::Seq(seq) => format!("{:?}", seq),
serde::__private::de::Content::Map(map) => format!("{:?}", map),
});
Ok(result)
}
pub fn invalid_result_serdevalue<'de, D, T>(
de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
let value = serde_value::Value::deserialize(de)?;
let result = T::deserialize(value.clone()).map_err(|_| match value {
serde_value::Value::Bool(b) => b.to_string(),
serde_value::Value::U8(u) => u.to_string(),
serde_value::Value::U16(u) => u.to_string(),
serde_value::Value::U32(u) => u.to_string(),
serde_value::Value::U64(u) => u.to_string(),
serde_value::Value::I8(i) => i.to_string(),
serde_value::Value::I16(i) => i.to_string(),
serde_value::Value::I32(i) => i.to_string(),
serde_value::Value::I64(i) => i.to_string(),
serde_value::Value::F32(f) => f.to_string(),
serde_value::Value::F64(f) => f.to_string(),
serde_value::Value::Char(c) => c.to_string(),
serde_value::Value::String(s) => s,
serde_value::Value::Unit => String::new(),
serde_value::Value::Option(option) => {
format!("{:?}", option)
}
serde_value::Value::Newtype(newtype) => {
format!("{:?}", newtype)
}
serde_value::Value::Seq(seq) => format!("{:?}", seq),
serde_value::Value::Map(map) => format!("{:?}", map),
serde_value::Value::Bytes(bytes) => {
String::from_utf8_lossy(&bytes).into_owned()
}
});
Ok(result)
}
When dealing with invalid data with serde, sometimes it is useful to get also invalid fields when present. The current
invalid_option
deserializer discards all invalid fields. It would be convenient to have a similarinvalid_result
deserializer, which returnsOk(value)
for all valid fields, andErr(string_field)
for all invalid fields.