Closed dtoniolo closed 9 months ago
This is almost certainly a bug, I agree. There are many issues on this tracker related to serde and its flatten
directive (and many issues about just serde
integration in general). I basically agree that your code snippet should work. The problem is that serde(flatten)
is a somewhat leaky directive. It does not behave identically to code where the structs are manually flattened into one another. You should be able to see this by running cargo expand
on your program. For example, the derived Deserialize
impl for your Nested
type deserializes as a map instead of a struct. Note the number of issues in Serde related to flatten
. flatten
is an amazing feature, but there are definitely some things wrong with it. Even worse, I don't actually know whether the bug you've reported here is something fundamental about serde-and-csv interaction, or if it's just something bogus with how I implemented Serde support. (I'd bet on the latter personally.)
It has been a long time since I wrote the Serde code in this crate, and I no longer remember what the essential challenges are here. IMO, the Serde support in this crate needs to be re-thought from first principles while taking existing issues into account. If something can't be supported because of how Serde (and CSV) works, then we should document it clearly with proposed work-arounds.
Thanks for your prompt answer. I'm looking both at the code produced by cargo expand
and rust-csv
's source code, hope I can find the issue.
I think I found the problem: it's a consequence of a long standing bug in serde related to flatten
.
Basically, when deserializing Flat
serde simply iterates over the input map and
"first_field"
and "second_field"
identifiers as Strings
.When deserializing Nested
, however, it first searches for "first_field"
, storing everything else it encounters in a temporary data struture called __collect
. It then passes this data structure to Inner::deserialize
and its related methods.
The data that is memorized into __collect
is not a simple copy of (or a reference to) the raw input bytes, but instances of a private enum
called Content
, which performs type inference during the deserialization from the input data. This is type inference is what incorrectly detects the input to be an integer. Inner::desrialize
then (correctly) produces an error because it recieves as input a map in which "second_field"
is associated with an integer and not a String
.
I'm closing this issue given that it's not actually specific rust-csv
. Thanks again for your support, it really helped me in finding the root cause of the problem.
Oof, that's frustrating. I still wonder whether we can do something better here, but I'm not going to have time to work on it any time soon.
First of all, thanks for your work on this crate. It has very convenient API and allows me to process large amounts of data quickly.
I'm opening this issue because I might have stubled on a bug in version
1.3.0
.Description
I'm parsing data from the rows of a csv file into custom data structures. The fields of each row are grouped into multiple structs in the Rust code, both because they logically belong to different groups and because the various groups are then used in distinct ways. Each row is then modelled by
CsvRow
, a struct that combines all of the others. All its fields are marked with#[serde(flatten)]
.I'm encountering the problem when I'm attempting to parse strings that contain only digits into fields that:
CsvRows
.String
as their type.When these conditions are met, the deserialization code incorrectly detects the input to be an integer. This then creates a type mismatch between the datum (some kind of integer) and the field (a
String
), which results in a runtime error.Reproducible Example
I've created a minimum reproducible example to show the problem. Here's my
Cargo.toml
:and the
main.rs
:Expected Behaviour
csv
s are concerned,Flat
andNested
should be equivalent. Specifically, attempting to deserialize the rows of the input files into these structs should result in the same errors and data.1234
could be interpreted as an integer, it's also a valid string. Given that all fields areString
s, it should be parsed as such even when using nested structs.Actual Behaviour
Building and running this code with
cargo run
results inThis means that:
first_field
is always parsed correctly as aString
.Nested
the data corresponding tosecond_field
is incorrectly detected as an integer. I guess that the deserializer is performing some type inference on the input datum, even though it has no need to do so.