cucumber / language-service

Cucumber Language Service
MIT License
12 stars 21 forks source link

Cucumber Expressions with optionals interpreted as Regular Expressions #191

Open kieran-ryan opened 1 month ago

kieran-ryan commented 1 month ago

👓 What did you see?

No matches with optionals

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

Steps to reproduce the behavior:

  1. Open Visual Studio Code

  2. Install the official Cucumber extension

  3. Create a feature file inside the features directory` containing

    Feature: Colour selection
    
      Scenario:
        Given I select the theme colour "red"
  4. Create a step definition inside the features/steps directory containing

    from behave import given
    
    @given('I select the theme colo(u)r "{color}"')
    def step_when(context):
        ...
  5. Observe the step in the feature file is highlighted as 'undefined'

📚 Any additional context?

The Language Service Python implementation prioritises Regular Expressions (checks first) over Cucumber Expressions.

A criteria for determining whether a pattern is a Regular Expression is whether it contains brackets () through specialCharsMatch.

https://github.com/cucumber/language-service/blob/6a35176c10828812e6fc86b0a5ab3dc774852906/src/language/pythonLanguage.ts#L135-L142

As a result, any Cucumber Expression containing an optional will be treated as a Regular Expression and the optional will instead be considered a capture group.

0: r {expression: 'I am on the profile customisation/settings page', parameterTypeRegistry: Aa, parameterTypes: Array(0), ast: ti, treeRegexp: r}
1: Us {regexp: /I select the theme colo(u)r "{color}"/, parameterTypeRegistry: Aa, treeRegexp: r}

A challenge is that in some languages a Regular Expression can be denoted by special prefix and suffix characters, whereas in Python, strings are similar in either case. See Java implementation:

https://github.com/cucumber/language-service/blob/6a35176c10828812e6fc86b0a5ab3dc774852906/src/language/javaLanguage.ts#L20-L24

Brackets usage in Regular Expressions with Python

Official Python documentation on regular expressions outline the use of brackets as follows:

(...)

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use ( or ), or enclose them inside a character class: [(], [)].

(?...)

This is an extension notation (a '?' following a '(' is not meaningful otherwise). The first character after the '?' determines what the meaning and further syntax of the construct is. Extensions usually do not create a new group; (?P...) is the only exception to this rule. Following are the currently supported extensions.

Further References

kieran-ryan commented 1 month ago

@mpkorstanje, wondering you by any chance have any guidance on this one - either intermediate or as a long term solution?

In essence - at least in the Python implementation - due to an invalid regular expression check, Cucumber Expressions containing optionals are being incorrectly treated as Regular Expressions. Thus, they are being considered 'undefined'.

mpkorstanje commented 1 month ago

With https://github.com/cucumber/vscode/issues/125 in mind, this looks like a pretty tricky problem. Cucumber and regular expressions have considerable overlap. Consider:

Hello(.+)\?
Hello( world)?
Hello world?

For Java, cucumber-expressions use a simple heuristic to determine what we're dealing with. This is implemented in the ExpressionFactory. (Note: The characters aren't anything special, they're the regex start of input and end of input markers).

I do not see a Python equivalent of the ExpressionFactory so unfortunately users of the cucumber-expressions library have to decide what is a regex and what is a cucumber expression. And the language service would have to duplicate that logic. So I do think it would be a good idea to implement a ExpressionFactory for Python to get rid of at-least some ambiguity by providing a canonical solution.

But that won't solve the problem. Where Cucumber JVM only supports regular and cucumber expressions, Behave currently only supports regular and parse expressions, while PyTestBDD-NG supports regular-, parse-, and cucumber-expressions, in addition to a heuristic.

So what kind of expression an expression is, depends entirely on the context. Is that something we can access from within vscode or the language service? If not, it might have to become a configuration flag.

kieran-ryan commented 1 month ago

This is great!

So what kind of expression an expression is, depends entirely on the context. Is that something we can access from within vscode or the language service? If not, it might have to become a configuration flag.

We could extract this information, though how it's configured varies quite a bit based on the framework and may be a challenge to maintain. The configuration option sounds like a great shout: minimal implementation and easily replicable across languages and frameworks.

Will look into the ExpressionFactory to gain an understanding and think about this further. Thanks a million!