golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124k stars 17.67k forks source link

encoding/csv: Provide a mean to change the quote character #8458

Closed gopherbot closed 9 years ago

gopherbot commented 10 years ago

by fuzxxl:

The package encoding/csv recognizes quoted fields. Sadly, it's not possible to change
the quote character to something different, as required for some use cases [1].

I request that encoding/csv be expanded to allow users to select a custom quote
character.

[1]: http://stackoverflow.com/questions/25062281/golang-enclosure-rule-for-csv-parsing
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-none.

adg commented 10 years ago

Comment 2:

This could be done by adding a SingleQuote boolean field to Reader and Writer. False,
its default, would indicate the usual double-quoted behaviour.

Labels changed: added suggested.

Status changed to Accepted.

gopherbot commented 10 years ago

Comment 3 by fuzxxl:

I think this would be a bad fix. The next time someone with an entirely different quote
character appears, this approach doesn't work anymore.
Why not add a field
    QuoteCharacter rune
Which defaults to double-quotes if set to 0?
minux commented 10 years ago

Comment 4:

What if people then want different quote characters for start and end
quotes?
satran commented 9 years ago

I've submitted a fix: https://go-review.googlesource.com/#/c/1576/ The fix is only for the reader. I was wondering if the writer should also have an option to specify the quote character.

bradfitz commented 9 years ago

Why?

Is there a standard that says the quote character can differ?

If you have a file format that uses a different quote character, is it really CSV?

I don't think so.

I'm compelled to close this without action. The encoding/csv package is small and easily forkable elsewhere for PQCSV ("pipe-quoted CSV") or whatever.

satran commented 9 years ago

I believe the standard does not specify it. You are right in pointing out that the package is small and easily forkable. Just saw this issue open and thought of fixing it :)

bradfitz commented 9 years ago

The standard specifies it as double quote. It's not a tweakable parameter:

https://tools.ietf.org/html/rfc4180 says

   escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
...
   DQUOTE =  %x22 ;as per section 6.1 of RFC 2234 [2]

We'll follow the standard.

People needing wacky formats can use wacky packages.

clausecker commented 9 years ago

CSV is well known for being a format ever program implements somewhat differently, like HTML in the early years of the internet. Yes, the standard says that the quote character is ", but there are many programs out there that expect differently formatted data. Having a CSV package that is flexible in the way it generates output is a very useful thing. I am not sure if suggesting that every program that tries to generate CSV files which do not exactly match the standard shall just fork the encoding/csv package is a good idea both in terms of code-reuse and reliability.

bradfitz commented 9 years ago

Can you provide a few examples of such programs, ideally popular ones?

Absent a compelling reason, I see no reason to introduce complexity for theoretical uses.

Even with examples, I'm tempted to say no just as a minor encouragement to those program's authors and users to do something more normal.

khuderm commented 8 years ago

I know this is old but I would like to add an example data which desperately needs this change to be read. I am currently working with some sentiment analysis stuff which uses SentiWordNet list. A row from the list:

a
00001740
0.125
0
able#1
(usually followed by `to') having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was able to program her computer"; "we were at last able to buy a car"; "able to get a grant for the project"

I put each column on a newline so it is easy to distinguish each column. The columns are tab (\t) delimited and long text does not have an enclosure. Obviously, because the long text already has double quotes in it, the csv package gives me an error trying to parse the quotes. Hope what I wrote makes sense lol.

gopherbot commented 8 years ago

CL https://golang.org/cl/23401 mentions this issue.

hasnickl commented 8 years ago

RFC4180 also mentioned the delimiter has to be comma (","), yet encoding/csv supports changing this

ianlancetaylor commented 8 years ago

@hasnickl This issue is closed. If you want to discuss this, please use the golang-dev mailing list. Thanks.