br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Escape separator within the string when loading csv file #616

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Currently the delimited parser cannot escape the separator appeared in a string 
field. We should somehow handle this properly. There are two options:

- support escaping the separator, like \,
- recognize the string which is quoted, like "hello, world"

Original issue reported on code.google.com by jarod...@gmail.com on 21 Aug 2013 at 4:44

GoogleCodeExporter commented 8 years ago
The proper answer is:
 - Ignore separators inside quoted strings (i.e., blindly swallow the string until it closes).
 - Allow escaping of the string terminator if it's supposed to be inside the string too.
    (But no need to escape the separator - the parser should skim on by and include it in the string's content.) 

Original comment by dtab...@gmail.com on 21 Aug 2013 at 9:16

GoogleCodeExporter commented 8 years ago
Excel seems to add quotes around strings that contain a comma when saving as 
CSV.
And quotes that are used inside quoted strings as escaped using another quote.
So the string
  "a,b","c,d"
is stored as
  """a,b"",""c,d"""
in the CSV file.
I think that it'd be a big step ahead, if we could decode this encoding when 
reading CSVs.

Original comment by westm...@gmail.com on 21 Aug 2013 at 9:22

GoogleCodeExporter commented 8 years ago
I like this proposal (but don't quote me on that :-)).

Original comment by dtab...@gmail.com on 21 Aug 2013 at 9:32

GoogleCodeExporter commented 8 years ago

Original comment by jarod...@gmail.com on 18 Nov 2013 at 11:02

GoogleCodeExporter commented 8 years ago
Let it be so!  (Now you can quote me on that.)
JUST TO CLARIFY - suppose the desired string field is:
    an example string "a,b","c,d" with nesting
This would be expected to be encoded in the incoming CSV file as:
    "an example string ""a,b"",""c,d"" with nesting"
So, to be clear, if there was a 3-column CSV row with that in the middle, we'd 
see:
    1,"an example string ""a,b"",""c,d"" with nesting",3
(This is what Excel does.)

Original comment by dtab...@gmail.com on 7 Sep 2014 at 6:38

GoogleCodeExporter commented 8 years ago

Original comment by wangs...@gmail.com on 13 Nov 2014 at 3:21