DaveAKing / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Add support in Splitter to ignore separator inside of quotes (or brackets, etc.) #1615

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
There's a fairly common use case of wanting to split on a separator, but not 
when that separator is inside of a quoted string. For example, the source 
String:

String value = "a,b,c",b,c,"d,e"

when split using something like

Splitter.on(',').exceptWhenSurroundedBy('"').split(value)

would result in:

"a,b,c"
b
c
"d,e"

A quick search results in about 20 StackOverflow questions on this topic. I 
know that I could use a regex for the separator pattern, but that can get hard 
to read and isn't as intention-revealing. I can also use a mutable CharMatcher 
as described in 
http://stackoverflow.com/questions/5746230/create-a-string-capable-guava-splitte
r, but it'd be nice to have this built-in.

Original issue reported on code.google.com by tedyo...@gmail.com on 18 Dec 2013 at 12:20

GoogleCodeExporter commented 9 years ago

Original comment by cpov...@google.com on 18 Dec 2013 at 12:23

GoogleCodeExporter commented 9 years ago
I don't think this is a duplicate of 813. #813 is specifically for CSV, whereas 
this issue would not only be used for CSV processing, e.g.,

String value = "[a][b][c]["this[one]"][d]"

Splitter.on("[]").exceptWhenSurroundedBy('"').split(value)

result: [a], [b], [c], ["this[one]"], [d]

Also, this issue is _much_ smaller in scope than CSV support, which seems 
unlikely to happen any time soon, so I'd hate to see this grouped together with 
a 2-year old issue.

Original comment by tedyo...@gmail.com on 18 Dec 2013 at 3:07

GoogleCodeExporter commented 9 years ago
The CSV parser is indeed a larger undertaking, one that would cover text with 
non-comma separators. I do think that that's the right thing here. We'd prefer 
to give users a way to remove quotes automatically, and we'd prefer to give 
them a way to escape any quotes that might appear in the string.

The other example you give will also need support beyond a simple 
exceptWhenSurroundedBy:

  String value = "[a][b][c][\"this[one]\"][d]";
  Splitter.on(anyOf("[]")).split(value) =>
, a, , b, , c, , this[one], , d, 

We could add in an omitEmptyStrings call, but then it becomes impossible to 
express a "true" empty segment with "[]".

There are surely many users who need only one or two additions to Splitter. The 
catch is that they're often different features. Whether our eventual CSV parser 
is built atop Splitter or not, we'd like for it to cover as many cases as 
possible, and we'd like for it to go the whole way where it can -- automated 
quoting and so forth.

Original comment by cpov...@google.com on 18 Dec 2013 at 4:57

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:10

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08