BurntSushi / rust-csv

A CSV parser for Rust, with Serde support.
The Unlicense
1.72k stars 219 forks source link

Serializing `None` vs serializing empty string #358

Closed lennartkloock closed 7 months ago

lennartkloock commented 7 months ago

What version of the csv crate are you using?

Using csv-core version 0.1.11 through csv-async

Briefly describe the question, bug or feature request.

I'm currently writing a program that imports data via csv into a PostgreSQL database. This crate seems to serialize Option::None and empty strings in the same way. This creates a problem because they are not the same. PostgresSQL differentiates between the two and parses an unquoted empty value (,,) as NULL and a quoted empty value (,"",) as an empty string.

Include a complete program demonstrating a problem.

#[derive(serde::Serialize)]
pub struct Test {
    a: String,
    b: Option<String>,
}

What is the observed behavior of the code above?

Test {
    a: "".to_string(),
    b: None,
}

is serialized as:

a,b
,

What is the expected or desired behavior of the code above?

Should be serialized as this (at least for my use case):

a,b
"",
BurntSushi commented 7 months ago

This is a PostgreSQL-ism and not really something that CSV itself cares about. CSV doesn't distinguish between "empty" and "null." That is, "", and , are semantically identical. The only way for you to achieve this is probably to not use Serde and write out the data yourself. (Or use a different import/export format for PostgreSQL.) The problem is that this library doesn't support granular control of quote styles. So you'll probably need to write out the CSV format yourself. Thankfully, writing CSV data is easier than reading it.

lennartkloock commented 7 months ago

I see. Still thanks for the fast reply.

lennartkloock commented 7 months ago

For people who face the same problem, this is how I solved it: Instead of using CSV as an intermediate format I used the BinaryCopyInWriter to directly write binary data to Postgres. It's way easier than converting to CSV first, and probably also a little faster.

BurntSushi commented 7 months ago

@lennartkloock Yeah that looks like a much better solution! Thanks for sharing it!