BurntSushi / rust-csv

A CSV parser for Rust, with Serde support.
The Unlicense
1.72k stars 219 forks source link

unexpected behavior (bug?) when using serde untagged with an enum to deserialize csv data #357

Closed klebs6 closed 8 months ago

klebs6 commented 8 months ago

What version of the csv crate are you using?

[[package]]
name = "csv"
version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac574ff4d437a7b5ad237ef331c17ccca63c46479e5b5453eb8e10bb99a759fe"
dependencies = [
 "csv-core",
 "itoa",
 "ryu",
 "serde",
]

Briefly describe the question, bug or feature request.

unexpected behavior (bug?) when using serde untagged with an enum to deserialize csv data

Include a complete program demonstrating a problem.

mod example {

    use super::*;

    #[derive(Clone,Debug,Serialize, Deserialize)]
    pub struct MyStruct {
        a: usize,
        b: String,
        c: i32,
    }

    #[derive(Clone,Debug,Serialize, Deserialize)]
    #[serde(untagged)]
    pub enum MyUntaggedEnum {
        V1 {
            a: usize,
            b: String,
            c: i32,
        }
    }

    #[cfg(test)]
    mod tests {

        use super::*;

        pub const CSV_DATA: &'static str = "\
            a,b,c
            1,cat,-4
            0,dog,4
            1,mouse,19";

        pub fn csv_reader() -> csv::Reader<&'static [u8]> {

            ReaderBuilder::new()
                .trim(Trim::All)
                .has_headers(true)
                .flexible(true)
                .terminator(csv::Terminator::Any(b'\n'))
                .from_reader(CSV_DATA.as_bytes())
        }

        pub fn deserialize_csv<RecordType>(
            rdr: &mut csv::Reader<&'static [u8]>

        ) where RecordType: Debug + serde::de::DeserializeOwned {

            let mut rows = vec![];

            for result in rdr.deserialize::<RecordType>() {
                match result {
                    Ok(row) => {
                        println!("{:?}", row);
                        rows.push(row);
                    },
                    Err(e) => eprintln!("Error: {}", e),
                }
            }

            assert!(!rows.is_empty());
        }

        // this test passes
        #[test]
        fn test_my_struct() {

            let mut rdr = csv_reader();

            deserialize_csv::<MyStruct>(&mut rdr);
        }

        // this test fails
        #[test]
        fn test_my_untagged_enum() {

            let mut rdr = csv_reader();

            deserialize_csv::<MyUntaggedEnum>(&mut rdr);
        }
    }
}

What is the observed behavior of the code above?

running 3 tests
test example_serde_bug::example::tests::test_my_struct ... ok
test example_serde_bug::example::tests::test_my_untagged_enum ... FAILED

failures:

---- example_serde_bug::example::tests::test_my_untagged_enum stdout ----
Error: CSV deserialize error: record 1 (line: 2, byte: 6): data did not match any variant of untagged enum MyUntaggedEnum
Error: CSV deserialize error: record 2 (line: 3, byte: 28): data did not match any variant of untagged enum MyUntaggedEnum
Error: CSV deserialize error: record 3 (line: 4, byte: 49): data did not match any variant of untagged enum MyUntaggedEnum
thread 'example_serde_bug::example::tests::test_my_untagged_enum' panicked at src/example_serde_bug.rs:62:13:
assertion failed: !rows.is_empty()

What is the expected or desired behavior of the code above?

both tests should pass and the data should deserialize properly into the V1 variant. if we add variants with different layouts to the MyUntaggedEnum, we should still be able to deserialize the CSV_DATA into the V1 variant.

klebs6 commented 8 months ago

fixed with this workaround (for example)

#[derive(Clone,Debug,Serialize, Deserialize)]
pub struct Transaction {
    #[serde(flatten)]
    inner: TransactionInner,
}

#[derive(Clone,Debug,Serialize, Deserialize)]
#[serde(untagged)]
enum TransactionInner {

    V1 {
        #[serde(rename = "Date")]
        #[serde(with = "naive_date_format")]
        date: NaiveDate,
    },
    V2 {
        #[serde(rename = "Posted Date")]
        #[serde(with = "naive_date_format")]
        posted_date: NaiveDate,
    },
}