dnrajugade / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

It seems wrong that Splitter.fixedLength(n).split("") returns [""], not [] #1178

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
So all separator-based splitters that don't use omitEmptyStrings have the 
property that split("") yields an Iterable containing just the empty string 
[""].  This follows from the fact that (a) if the separator doesn't exist 
anywhere in the input, splitting should yield only the entire input, and (b) we 
reject any separator regex that matches the empty string.

But one type of Splitter isn't separator-based: Splitter.fixedLength.  Here 
also, splitting the empty string yields an Iterable containing only the empty 
string.  (Again, unless omitEmptyStrings is requested, which is just a strange 
thing to have to request of a fixed-length Splitter.)

But in this case, this seems weird.  It violates the otherwise-invariant that 
split().size() == divide(input.length(), fixedLength, CEILING). For 
Splitter.fixedLength(4), there are 4 different input sizes that can yield an 
Iterable of size 2, or 3, etc., but there are 5 different input sizes that can 
yield an Iterable of size 1. This is not how Lists.partition() behaves, so why 
Splitter?

I tried to rationalize this by thinking of fixedLength as though imaginary 
zero-width separators were jammed into the input at positions n, 2n, etc. (But 
not at position 0 since that would violate the rule that the separator must not 
match the empty string).  However, that's no help, as that implies that 
fixedLength(4).split("abcd") would yield ["abcd", ""].

I could find no record that we discussed this question when reviewing the code 
initially. A reviewer did remark on how quirky it was that 
fixedLength().omitEmptyString() could actually be useful, but that didn't raise 
a red flag.

It's tough to know what could get subtly broken by changing this now, so I'm 
attempting to convince myself that this little quirk is either (a) good or (b) 
at least tolerable.

Original issue reported on code.google.com by kevinb@google.com on 25 Oct 2012 at 11:38

GoogleCodeExporter commented 9 years ago
If we decide not to change this, we will at least document this quirk clearly.

Original comment by kevinb@google.com on 25 Oct 2012 at 11:56

GoogleCodeExporter commented 9 years ago
Quirk has been documented.

Original comment by kevinb@google.com on 29 Oct 2012 at 7:59

GoogleCodeExporter commented 9 years ago
Then this should be marked fixed, no?

Original comment by lowas...@google.com on 30 Oct 2012 at 7:25

GoogleCodeExporter commented 9 years ago
You're misreading comment 1 as "If _and only if_."  The decision hasn't been 
made.

Original comment by kevinb@google.com on 30 Oct 2012 at 8:13

GoogleCodeExporter commented 9 years ago
We have decided it would be correct to change this. Though I'm not sure we have 
yet thought through what problems we might cause by having libraries out there 
in the wild that might do this or do that depending on what version of Guava 
they get run against. I suppose I still feel hesitant.

Original comment by kevinb@google.com on 1 Nov 2012 at 8:03

GoogleCodeExporter commented 9 years ago
I have a slogan for this: "Terminators, not separators!".

If one only uses separators then [] and [""] encode to the same thing.

Original comment by ray.j.gr...@gmail.com on 19 Dec 2012 at 12:47

GoogleCodeExporter commented 9 years ago
Kevin, want to take a stab at this one?

Original comment by kak@google.com on 22 Aug 2013 at 11:06

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:13

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:18

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08