acuminous / yadda

A BDD javascript library
412 stars 73 forks source link

Can not support the back reference of the regular expression in the dictionary definition. #229

Closed snowyu closed 7 years ago

snowyu commented 7 years ago
str_convert = (delimiter, value, next)->
  value = value.replace /\\(.)/g, '$1'
  next null, value

  # the string is like the 'sdsdd', or "asassf". support the "\" to escape char.
aDictionary.define 'string', /(['"])([^\1\\]*(?:\\.[^\1\\]*)*)\1/, str_convert

# this can not match always, because \1 is the `$num`.
lib.define.given '$num, $string', ->
cressie176 commented 7 years ago

If I understand correctly you are using back references to ensure the same character (either a " or ') is used to delimit the beginning and end of a string, e.g.

1, "the quick brown fox"

You are defining a step,

lib.define("$num, $string", function(n, s)

And a dictionary term similar to

dict.define("string", /(["'](.*)\1/, converter)

However when you come to interpret the step Yadda is erroring with

     Error: Scenario cannot be interpreted
1, "the quick brown fox" <-- Undefined Step

because when expanded into a regular expression

$num $string

becomes

(.+) (["'])(.*)\1

And the back reference \1 now refers to the number matching group (.+) rather than delimiter matching group (["']). In the above example everything works if I were to use \2 instead of \1, but that's obviously only a very brittle workaround.

A more robust workaround would be for Yadda to reindex back references, but considering how error prone and complicated this is likely to be, for quite a small edge case I'm reluctant to take this approach.

Another solution might be to have a slightly dumber regex, e.g.

dict.define('string', /(".+"|'.+')/, str_convert)

and remove the first and last characters from the string in str_convert. Would this solve the problem?

snowyu commented 7 years ago

Yes,exactly. But your workaround can not reslove the escaped char like: "string\"it\"".

I thinks reindex the back reference in the step should be perfect. What's the edge effects about this? It should be more easy if using the Regexp lexer to do so.

cressie176 commented 7 years ago

But your workaround can not reslove the escaped char like: "string\"it\"".

True. What are you actually trying to achieve with Yadda? In all the situations I've hit this problem before, I have enough influence over the steps that it's been OK to use a delimiter that won't appear naturally in the step text.

I thinks reindex the back reference in the step should be perfect. What's the edge effects about this? It should be more easy if using the Regexp lexer to do so.

I think it's dangerous to re-index the back references regex for two reasons.

  1. I think the code to do it would be complicated and likely to contain bugs (I'm happy to consider a PR written in JavaScript proving me wrong).

  2. I've no way of knowing that the users intention for \1 wasn't to always match the first group. By re-indexing I could break someones code.

Can you get this working with the back references, but by not using the dictionary, e.g.

lib.define(/given (\d+), (['"])([^\2\\]*(?:\\.[^\2\\]*)*)\2/, function(num, value) {
  num = parseInt(num, 10)
  value = value.replace(/\\(.)/g, '$1')
})
snowyu commented 7 years ago

Actually the general problem is the regex merge. once the final regex is determined, then the index of the group is determined too. So the re-index should be calculated on the final regex(not on dict).

Some pseudo code:

# seperate the RegExp string to an string array. 
regExpStrs = seperateFrom aStep
finalRegExpStr = concatRegExpStr regExpStrs 

concatRegExpStr = (aRegExpStrs)->
  result = ''
  # keep this to re-index
  currentGroup = 0
  for aRegExpStr in aRegExpStrs
     result += reIndexBackRef currentGroup, aRegExpStr 
     currentGroup += getGroupCount aRegExpStr 
  result

But Maybe I missing sth...

cressie176 commented 7 years ago

Actually the general problem is the regex merge. once the final regex is determined, then the index of the group is determined too. So the re-index should be calculated on the final regex(not on dict).

I agree this is what would need to be done to resolve your issue, but once again I don't think it's feasible. I can imagine all sorts of edge cases cropping up when trying to re-index the back references. e.g. you'd have to be sure to not reindex \\1. You'd also have to be sure that the user wanted the regex reindexed. Maybe they want the first match no matter what.

All things considered I'm not going to implement this. You can still use back references provided you don't use a dictionary by putting the full regex in the library definition, or if you want to implement it yourself then I suspect you could use a custom dictionary with modified expand method, or a custom library which reindexed the back references after expand was called. But it's not something I think Yadda should do.