DaveAKing / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Utility method for regex matching, returning String[][] #1718

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
    /**
     * Matches a string to a regex and returns the matches as String[][],
     * each row is a match, each column a group
     * @param source : string to be matched
     * @param regex : regex to be matched
     * @return : String[][] containing matches
     */
    public static String[][] getMatches( String source, String regex){
        Pattern pattern = Pattern.compile( regex);
        Matcher m = pattern.matcher( source);
        List<String[]> matches = new ArrayList(); 
        int numGroups = m.groupCount();
        while( m.find()){
            // group 0 is the entire match
            String[] groups = new String[numGroups+1];
            for( int i=0;i<=numGroups; i++){
                groups[i] = m.group(i);
            }
            matches.add( groups);
        }
        String[][] arrMatches = new String[matches.size()][numGroups]; 
        return matches.toArray( arrMatches );
    }

Original issue reported on code.google.com by manojmok...@gmail.com on 9 Apr 2014 at 7:43

GoogleCodeExporter commented 9 years ago
Is there a particular use case where you want each of the capturing groups to 
be treated homogenously, mixed into one untyped String[]?  (And why would you 
prefer a String[][] to a List<List<String>>?)

Original comment by lowas...@google.com on 10 Apr 2014 at 9:19

GoogleCodeExporter commented 9 years ago
@lowas, did not understand your comment about homogenity, i have always needed 
to access the groups are strings, tho they may require some conversion later. 
The method is meant to provide an easier/(less code) usage, esp in cases when 
the no of matches are known, e.g. extract servername from a url etc. In such 
cases, a String[] is easier to access than a List.
e.g. servername = getMatches( "mm@gg.com", "(.*)@(.*)")[0][2];

Original comment by manojmok...@gmail.com on 11 Apr 2014 at 6:23

GoogleCodeExporter commented 9 years ago
I'm not sure about the usefullness of this. I would like to see the numbers on 
how often people actually match a pattern multiple times on the same string to 
get multiple groups ... of captured groups.

However, I'm certain that everyone who has ever worked with regexes has written 
the code to get the captured groups of the pattern matched once - as seen e.g. 
in the post #2. That code could indeed use some librarification as it's used 
quite often, not sure whether it's Guava-worthy, though.

It's almost an inverted Splitter, isn't it? But instead of splitting on the 
matched parts, we want to split on everything else and retain the matched parts.

Also, what should the behaviour for no matches be? Exception or empty result? 
If an empty result, then calling getMatches("blabla", "(.*)@(.*)")[0][2] would 
result in a cryptic ArrayIndexOutOfBoundsException. What if the pattern doesn't 
contain a grouping?

Original comment by JanecekP...@seznam.cz on 11 Apr 2014 at 8:43

GoogleCodeExporter commented 9 years ago
@Janeck, if the pattern has no grouping, it would still return the 0th column 
as the entire match. Currently no-matches will return empty []. otherwise too, 
the length of the returned array does need to be checked before we can access 
[i] safely. 

Original comment by manojmok...@gmail.com on 11 Apr 2014 at 9:43

GoogleCodeExporter commented 9 years ago
We could have a version with a Pattern parameter, to allow the Pattern obj to 
be reused.

Original comment by manojmok...@gmail.com on 11 Apr 2014 at 10:07

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:09

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:17

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:07