goldmansachs / gs-collections

GS Collections has been migrated to the Eclipse Foundation, re-branded as Eclipse Collections. https://www.eclipse.org/collections/
https://www.eclipse.org/collections/
1.81k stars 276 forks source link

StringIterate.csvTokensTo* methods lack support for quoted field #5

Closed ShijunK closed 9 years ago

ShijunK commented 12 years ago

csv is not as simple as StringIterate.tokensToList(string, ",");, a csv parser should be able to support quoted field which means comma within double quoted field should not be treated as a delimit.

test case:

String str = "a, b,\ "c,d\""; Verify.assertListsEqual(FastList.newListWith("a", "b", "c,d"), StringIterate.csvTokensToList(str));

dOngithub commented 12 years ago

Hey! ShijunK can we replace the delimiter from "," to any other one(Symbol) then it should not take "," as a delimiter

ShijunK commented 12 years ago

hi dOngithub, I am talking about escaping a delimiter within quoted content, to csv format, using double quoting is quite a common practice to escape comma though there is no such an industry standard. Yes, we can work around this issue by using a different delimiter. However, in real world, many times you will find little or no choice about what you are receiving from upstream/outside partner. Instead do adhoc hack, you really want a framework like gs-collections to do the right thing when it claims it is capable to do so ( csvTokensToList ).

TWiStErRob commented 10 years ago

@dOngithub http://tools.ietf.org/html/rfc4180

goldmansachs commented 9 years ago

Implementing full csv parsing is beyond the scope of GS Collections. We have deprecated the StringIterate.csv* methods since they don’t handle cases such as quoted strings, embedded commas and quotes, newlines, etc. Instead, we’d suggest using a separate library for proper csv parsing. If that library returns an array of tokens, you can use ArrayAdapter to get the MutableList API.