Closed gopherbot closed 9 years ago
Comment 2 by borman@google.com:
This is unfortunately working correctly. You input: "Col1\t \tCol3\tCol4" with TrimLeadingSpace set to false produces 4 fields, the second one being " ", but with TrimLeadingSpace set to true the the input for the second field is " \tCol3\tCol4". The " \t" should be removed as they are both space characters so the second field is "Col3" and only 3 fields are produced. What should the following input be if TrimLeadingSpace is true and Comma is a space? (. == space) "a.b..c...d"? Using a white space character as a comma and setting TrimLeadingSpace is not well defined. It would be better for you to call strings.TrimLeft(field, unicode.IsSpace) on your fields after having CSV parse them without trimming the leading space.
There is no standard* defining neither usage of delimiter characters other than comma, nor how to trim leading spaces, so I think saying "working correctly" is just to say "It is doing what the code tells it to do", and that is true. However, it is common to use other delimiters (as well as enclosures). In some country other characters are more or less standard (; in Sweden). In LibreOffice there 5 standard choices for delimiters when saving to CSV, two of them white space characters: , ; : {tab} {space} In these cases, non-enclosed delimiters are always considered delimiters, never a white space. Your example a.b..c...d should of course result in: a b {empty} c {empty} {empty} d * Reffering to: http://www.ietf.org/rfc/rfc4180.txt
I think we should probably leave this as is. Trying to mix TrimLeadingSpace with Comma = a space has multiple conflicting definitions, so we have to pick one. We might as well pick the one that Go 1 used and avoid having different behaviors in different versions. You can always turn off TrimLeadingSpace and then apply it to the fields yourself after the parse.
Status changed to Unfortunate.
by accipiter: