golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.61k stars 17.61k forks source link

encoding/csv: add column level introspection capabilities #22590

Open silviucm opened 6 years ago

silviucm commented 6 years ago

As briefly mentioned inside certain csv.Writer comments, there are frequent use cases, particularly with Postgres bulk copy to / from, where it is useful to differentiate between unquoted and quoted empty strings.

In particular, for csv.Reader, by the time Read() returns a slice of strings, it is too late to know whether a certain empty string is a quoted empty string or an unquoted one.

My particular use case deals with determining the nullability of an empty unquoted string, but - more generally - I just wanted to see if there is any appetite towards adding such column level introspection capabilities to the standard library csv.Reader and csv.Writer, without breaking compatibility.

For Reader, this would expose an additional surface perhaps similar to:

  type CsvValue interface {
    Quoted() bool   // returns true if the value is quoted, false otherwise
    String() string
    // ... whatever else comes to mind
   }

   func (r *Reader) ReadValues() (record []CsvValue, err error) {
        ...
   }

Regards, silviu

ianlancetaylor commented 6 years ago

I think there is very little appetite for adding any new API to encoding/csv.

sandan commented 4 years ago

The problem seems to be differentiating between a NULL value vs. an empty string for string database types. RFC4180 doesn't seem to have any rules regarding this. How about this proposal instead:

  1. Any fields that have quoted empty strings in a record will be considered empty strings when read in csv.Reader. For example the record: a,"",b returns ["a", "", "b"]

  2. Any fields that are unquoted empty strings in a record will be considered nil when read in csv.Reader. For example, the record: a,,b returns ["a", nil, "b"]

And vice versa for csv.Writer.