davidchambers / tutor

JavaScript interface for the Gatherer card database
https://gatherer.wizards.com/
Do What The F*ck You Want To Public License
149 stars 18 forks source link

fixes #71 - changed the regular expression parsing mana costs in the full set spoiler #72

Closed robdennis closed 11 years ago

robdennis commented 11 years ago

the original one didn't handle multiple digits in a row of mana cost (think Eldrazi)

it's certainly possible that the modified regular expression can be even further tweaked

davidchambers commented 11 years ago

The regular expression is getting unwieldy. It'd be great if you could investigating other approaches for this section of code. One approach that might be clearer is to use String::split to tokenize the raw value, then format the resulting array as a separate operation:

"1(G/W)(G/W)"  ------->  ["1", "G/W", "G/W"]  ------->  "{1}{G/W}{G/W}"
               tokenize                        format

We currently go from one string representation to another in one step, but I think using a two-step process might result in simpler code.

davidchambers commented 11 years ago

This may be overkill, but we could use a state machine:

tokenize = (mana_cost) ->
  tokens = []
  level = 0
  s = mana_cost
  while s
    if s[0] is '('
      throw new SyntaxError 'Unexpected "("' if level++ > 0
      token = ''
    else if s[0] is ')'
      throw new SyntaxError 'Unexpected ")"' if level-- is 0
      tokens.push token
    else if level > 0
      token += s[0]
    else
      [token, s] = /^(\d+|.)(.*)$/.exec(s)[1..]
      tokens.push token
      continue
    s = s.substr 1
  throw new SyntaxError 'Unmatched "("' if level > 0
  tokens

I'm going to go ahead and merge this pull request, then give some thought to whether replacing an unwieldy regular expression with a 20-line state machine is sensible.

robdennis commented 11 years ago

I agree that it would be simpler, but I think the complexity of the existing regex is to account for costs like "2WW" as well as the hybrid/phyrexian forms that have parentheticals.

I'm playing around with it, but I'm not seeing a super simple way to tokenize without another ugly regex, but given the tokenized form I could do something like

card.mana_cost = '{'+tokens.join('}{')+'}';
davidchambers commented 11 years ago

I'm playing around with it, but I'm not seeing a super simple way to tokenize without another ugly regex, but given the tokenized form I could do something like

card.mana_cost = '{'+tokens.join('}{')+'}';

Yes, or:

_.reduce tokens, ((memo, token) -> "#{memo}{#{token}}"), ''
robdennis commented 11 years ago

heh, I forget there's more than just raw, libraryless js sometimes.

if you punt on handling "anything in parantheses" which is what it kind of was before, and just assume it's one of (in order since regexes match left to right):

won't that get there? see crazy mana cost below

tokens = "21WW(U/P)(2/W)".match(/.\/.|\d+|\w/g)
["21", "W", "W", "U/P", "2/W"]

that's certainly a simpler regex

it'd look like:

# 1(G/W)(G/W) -> {1}{G/W}{G/W} | 11 -> {11}
card.mana_cost = '{'+val.match(/.\/.|\d+|\w/g).join('}{')+'}'      #  I feel a bit foolish, but I kept getting parse errors using your _.reduce code, so maybe you can edit this
card.converted_mana_cost = to_converted_mana_cost card.mana_cost

edit: since I'm heading to bed (I'm EDT so it's almost 1am), I might push this with the modified regex approach, with the comment issue number, and with the non-_.reduce and that puts you in the best position to merge edit I think

davidchambers commented 11 years ago

Very nice! I like the simplicity of your solution.

I can think of a couple of ways to potentially improve the pattern:

  1. Use /// to avoid having to escape the literal "/":

    /// ./. | \d+ | \w ///g
  2. Add comments:

    /// ./.   # Phyrexian mana symbol, e.g. "G/W"
     | \d+   # colorless mana symbol, e.g. "11"
     | \w    # other mana symbol, e.g. "U"
    ///g

Do you think the \w in the regular expression should simply be .? There's no rule which dictates that mana symbol characters must be alphanumeric (and \w matches "_", at any rate).

davidchambers commented 11 years ago

Do you think the \w in the regular expression should simply be .? There's no rule which dictates that mana symbol characters must be alphanumeric (and \w matches "_", at any rate).

Oops. That'd match "(" and ")". We'd need to use something like [^()].

davidchambers commented 11 years ago

Closed via ea0e7867246dfdc93446aa0ca4456e6899c0d01e.

robdennis commented 11 years ago

I like the proposed (already happened?) changes to the regex. Aces.

davidchambers commented 11 years ago

Cool. I tried adding comments to the regular expression, but they impeded rather than aided reading. The comment on the line above does a good job of explaining what the ./. and \d+ are there to match.

I enjoyed collaborating with you on this pull request last night. :)

robdennis commented 11 years ago

Same :) although I felt 3 hours dumber due to time zones. yaaaay