Open tyilo opened 11 months ago
This does indeed look like a bug. I don't know when I'll have time to look into it, so patches are welcome if you can find a fix yourself.
I'll note that the presence of serde(flatten)
concerns me. IIRC, it required some hacky things to make it work and it may not have been done correctly.
It seems like serde
is calling deserialize_string
for str
, but deserialize_any
for inner_str
. Is this a bug in serde
?
I don't know. I don't know much about serde internals. I do know that the serde(flatten)
thing is a bit of a tortured feature though. It's one of those things that folks don't have to actually use, but it affords some very nice conveniences. But it just doesn't work in all cases.
I won't have time to look into this any time soon I'm afraid.
Yeah, it seems to be unsupported in serde
: https://github.com/serde-rs/serde/issues/1881
That's unfortunate :/
I believe the issue here is that CSV is not actually a "self-describing" format, and therefore should not support deserialize_any()
. A single element in a CSV does not encode it's type; you can't tell if 12345
is an integer or a string. The deserializer needs to rely on the the type to tell it how to interpret the data, it can't determine it itself.
#[serde(flatten)]
is only supported for self-describing formats because it needs to parse the tokens ahead of time using deserialize_any()
, since it doesn't know what types the nested fields contain yet. Since the deserialization then simply "guesses" the type, it might guess wrong and you get issues like this.
As far as I can tell, there's no way to actually fix this without fundamentally changing the csv format to be actually self-describing. IMO there shouldn't be a deserialize_any()
implementation for this format at all, although that's probably hard to walk back at this point.
See also: serde.rs's documentation for using deserialize_any()
.
@Anders429 That is a very insightful comment, thank you. I don't think I realized myself at all that flatten support depended on deserialize_any
.
It's definitely surprising, because you could easily implement most #[serde(flatten)]
use cases by hand without needing to call deserialize_any()
. I found a couple issues on serde
's repository that ask questions about this.
Seems like flattening is hard to support for serde
's derive macro, because the macro can't know anything about nested types, and the nested types can't be partially deserialized. It's a limitation that makes flattening pretty cursed; it seems that quick-xml
has the same type of issue: https://github.com/tafia/quick-xml/issues/286.
I ran into the exact same issue today, here's my code:
#[derive(Debug, serde::Deserialize)]
#[serde(rename_all(deserialize = "PascalCase"))]
pub struct PartMappingRecord {
eda: CSVEdaToolValue,
manufacturer: String,
mpn: String,
#[serde(flatten)]
fields: HashMap<String, String>,
}
the csv file:
"Eda","Manufacturer","Mpn","Name","Value"
"DipTrace","Molex","0479480001","0479480001",""
the error:
Error: Deserializing part mapping record
Caused by:
CSV deserialize error: record 1 (line: 1, byte: 42): invalid type: integer `479480001`, expected a string
In the above example, the "Mpn" column's value is ok, but the "Name" column, which is flattened into fields
is produces an error.
the source: https://github.com/MakerPnP/makerpnp/blob/master/src/stores/part_mappings.rs#L20
For my use-case, I need to support any number of arbitrarily named csv columns, since each different EDA tool that generates placement CSV files will always have their own column names and the part mappings file has to use the same column names too.
Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1, for example:
Suggestions on workarounds or solutions greatly appreciated!
I think something folks sometimes forget is that serde is an optional convenience feature. You don't actually need to use it! You can use csv::StringRecord
and it will handle literally everything and give you infinite flexibility.
Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1,
It's not. Or it's not intended to be: https://docs.rs/csv/latest/csv/struct.Position.html#method.line
If you are experiencing the opposite, please open a new issue with an MRE.
I think something folks sometimes forget is that serde is an optional convenience feature. You don't actually need to use it! You can use
csv::StringRecord
and it will handle literally everything and give you infinite flexibility.
My project uses serde with serde-derive for other serialization tasks so it made sense to use it for CSV files too. I'm just about to start investigating a solution/work-around to this issue and will keep that in mind. thanks for the timely reminder!
Additionally, the 'line' counter seems to be 0 based, which is confusing to humans, since most text editors and IDEs display line numbers starting from 1,
It's not. Or it's not intended to be: https://docs.rs/csv/latest/csv/struct.Position.html#method.line
If you are experiencing the opposite, please open a new issue with an MRE.
ok, I will check again and future errors and will open issues as required.
What version of the
csv
crate are you using?1.3.0
Briefly describe the question, bug or feature request.
When deserializing to a
String
field using serde,csv
can handle a field containing123
. However, when doing the same to a flattened field,csv
can no longer deserialize123
to theString
field and I get the error:Include a complete program demonstrating a problem.
What is the observed behavior of the code above?
Output:
What is the expected or desired behavior of the code above?
Output: