Regex group name support

txdv commented 8 years ago

I would like to add regex group names to function parameter mapping. Now let me explain this in detail:

Ruby regexes support group naming, like this

/(?<name>\TEST\S+)/.match("TESThello")[:name]

This will return TESThello.

WIth a bit of magic we can even get a hash with all the named groups, like this:

class MatchData
  def to_hash(symbols=true)
    self.names.map(&:to_sym).inject({}) { |hash, key| hash.merge(key => self[key]) }
  end
end

/(?<number>\d+) (?<name>\S+)/.match("92 HelloWorld").to_hash

Will return now {:number=>"92", :name=>"HelloWorld"}

Now the idea is to use this in match like this:

match /command (?<number>\d+) (?<name>\S+)/, method: :command
def command(m, number, name)
  # ...
end

If I switch the order of the arguments in the command function to name, number, it will still map them correctly.

Now for simple regexes this seems to have no benefits, because it would put the first group in the second argument and the second group in the third argument, but, if I have more complex regexes like:

/kick ("(?<pattern>.+)" (?<reason>.+)|(?<pattern>.+)|(?<pattern>\S+) (?<reason>.+)|(?<pattern>\S+))/

The amount of not named groups is just too high and with named groups mapping to argument names it would provide an easier way to deal with more complex regexeps - there would be only 2 arguments after 'm' instead of many.

Maybe this functionality already exists, I just wasn't able to find it.

txdv commented 8 years ago

In the channel it was mentioned that this would be a new API and there might be problems when interacting with the old API. I suggest that this named group mapping to parameters of a function would be disabled by default and only enabled with a special parameter passed to the match function:

match /command (?<number>\d+) (?<name>\S+)/, type: :named, method: :command
def command(m, number, name)
  # ...
end

This will ensure that positional group mapping works like it used to and that there is no overlapping of functionality. If the user desires to use named group mapping then only an additional parameter is needed to switch to the proposed mode.

dominikh commented 8 years ago

Here are the reasons I'd rather not work on this idea:

1) What happens with a regexp like /(.+) (?<arg>.+)/ and a method meth(m, arg, other_arg) – what variable will the unnamed capture group be assigned to? What happens if someone calls the capture group m?

2) Introducing a new toggle, like type, would mean that it's not anymore possible to dynamically pass a regexp to match (which, really, is a wrapper around Bot#on) without also needing to convey the type of the regexp and modify the other arguments. It also makes the API surface even larger.

Also, Cinch is really in maintenance mode, i.e. bug fixes only.

It should be rather easy for you to write your own function nmatch, that executes a lambda that takes care of the argument mapping and then calls the actual method, though.

petertseng commented 8 years ago

One question I have is similar to the question of @dominikh , which is how can Cinch detect names of positional arguments? For example, given the below, how would Cinch know that the parameter in second position is called number and the third is name?

def command(m, number, name)
  # ...
end

The plugin writer would have to declare that mapping, right? That seems like an unfortunate burden to have to impose on the would-be users of this potential API, and there are many questions on how that would even be declared. Unless there is a way in Ruby to introspect the names of positional arguments. A quick search from me has not turned up a way to detect that, but maybe I was simply not searching for the right thing. Please help if you know how.

Now, one could consider something that uses keyword arguments in Ruby 2.0, as with the below code snippet:

def command(m, number: 1, name: 'user')
  # ...
end

However, the desire to maintain compatibility with Ruby 1.9 would complicate this, as you cannot use keyword arguments in Ruby 1.9. The best Ruby 1.9 can do is something like

def command(m, opts = {})
  number = opts[:number] || 1
  name = opts[:name] || 'user'
end

So if this is to be implemented it would want to be compatible with both of these approaches.

dominikh commented 8 years ago

Getting the argument names is actually not a problem, since Ruby 1.9.3:

irb(main):003:0> def command(m, number, name);end
=> :command
irb(main):004:0> method(:command).parameters
=> [[:req, :m], [:req, :number], [:req, :name]]

txdv commented 8 years ago

Here is a simple example:

class MatchData
  def to_hash(symbols=true)
    self.names.map(&:to_sym).inject({}) { |hash, key| hash.merge(key => self[key]) }
  end
end

def command(number, name)
  puts "number: #{number}"
  puts "name: #{name}"
end

match = /(?<number>\d+) (?<name>\S+)/.match("92 HelloWorld")

params = method(:command).parameters.map { |param| param.last }
values = match.to_hash.values_at(*params)
method(:command).call(*values)

txdv commented 8 years ago

Here is what I have come up with

class SomePlugin
  include Cinch::Plugin
  @@patterns = []

  def self.nmatch(pattern, options = { })
    @@patterns << { pattern: pattern, options: options }
  end

  listen_to :message, method: :on_message
  def on_message(m)
    @@patterns.each do |pattern|
      options = pattern[:options]
      pattern = pattern[:pattern]
      case pattern
      when Regexp
        match = pattern.match(m.message)
        next if match.nil?
        method = method(options[:method])
        params = method.parameters.map { |param| param.last }
        values = match.to_hash.merge({ m: m }).values_at(*params)
        method.call(*values)
      end
    end
  end

  nmatch /!on (?<one>\d+)( (?<two>\d+))?/, method: :on
  def on(m, one, two)
    m.reply "#{two} #{one}"
  end
end

@dominikh do you have any comments on this approach?

cinchrb / cinch

Regex group name support #216