DaveAKing / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Retrieve all indices of a CharMatcher or a String in a CharSequence #1674

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
String.indexOf and CharMatcher.indexIn both return values that may be actual 
indices or -1.

Managing that -1 might sometimes be unnatural in loops. Typically, we can have 
the following code:

for (int index = line.indexOf(','); index != -1; index = line.indexOf(',', 
index + 1) {
  // do stuff.
}

In the "increment" part, you can see that I have to write "+ 1" to get on the 
next index. Also, you can see that the "condition" part of the for loop is a 
bit unnatural where I check that all indices are different from -1.

I feel like there's a hole to be filled if we are able to simplify the loop 
somehow. For instance, if we can loop liek this, it would be much more readable:

for (int index: Strings.indicesOf(',')) {
  // do stuff.
}

The same is applicable for CharMatcher:

CharMatcher matcher = CharMatcher.anyOf(";,");
for (int index: matcher.indicesIn(line)) {
  // do stuff.
}

Ideally I'd say this method should return an int[], but a List<Integer> or 
Iterable<Integer> would be fine too.

Original issue reported on code.google.com by ogregoire on 19 Feb 2014 at 2:32

GoogleCodeExporter commented 9 years ago
Everyone will have corrected my first suggestion for Strings:

for (int index: Strings.indicesOf(line, ',')) {
  // do stuff.
}

Original comment by ogregoire on 19 Feb 2014 at 2:36

GoogleCodeExporter commented 9 years ago
Is there a general pattern for what these loops do? I wonder if there's an even 
higher level API that could operate on all matching indexes just as Splitter, 
CharMatcher.replaceFrom, etc. do.

Original comment by cpov...@google.com on 19 Feb 2014 at 2:38

GoogleCodeExporter commented 9 years ago
Yes, there is actually. I have to check that my Strings match those pattern and 
reject the lines where the separators are not well placed. So technically I'm 
comparing the result of the now-imaginary Strings.indicesOf() to specific 
values I was given. I don't handle the data themselves. I just check the format.

The alternative with Splitter would be to check all the lengths of the 
"fields", but businessly speaking, that would actually make the code harder to 
understand as all those loops happen in handling old mainframe records where 
"character positions" matter instead of "fields".

Original comment by ogregoire on 19 Feb 2014 at 2:48

GoogleCodeExporter commented 9 years ago
Interesting. Would you end up writing a loop, then, or would you just check the 
returned array/list for equality with the expected values? This seems like a 
relatively unusual case (no instances in the Google codebase, at least for my 
naive search), so I'm also wondering about alternatives. (The only one that 
comes to mind is regular expressions, but constructing a regex would be messier 
than operating on the indexes directly.)

Original comment by cpov...@google.com on 19 Feb 2014 at 2:55

GoogleCodeExporter commented 9 years ago
Currently all my indices for the various formats are stored in int arrays. I'm 
currently looping as I look for the indices, but I would strongly prefer to 
compare arrays.

By the way, I've thought as well about the regex and quickly discarded it for 
the same reasons you mention.

Original comment by ogregoire on 19 Feb 2014 at 3:01

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:10

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07