Behat / Gherkin

Gherkin parser, written in PHP for Behat project
MIT License
1.05k stars 92 forks source link

The Keywords API is based on cucumber gherkin 2 #203

Open stof opened 3 years ago

stof commented 3 years ago

The syntax using | delimiters and < to decide whether a space is necessary or no at the end of the keyword is a legacy from cucumber/gherkin 2. In their 3.0 version, they changed their language system to use a list of keywords instead of a string needing to be split, and they include the space itself in the translation when it is necessary (i.e. in most languages).

Our update_i18n script is currently converting back to the cucumber/gherkin 2 format to create our own i18n.php to be compatible with our own keyboard API. The Lexer then need to undo most of these changes (and might not actually be compliant with upstream cucumber gherkin regarding the need for a space or no after the token due to the way this is implemented).

I suggest building a new dialect API (Dialect is the upstream name for that) relying on the modern API (an array of strings for keywords). We would then have a compat layer which would the DialectProvider API on top of the existing Keywords API (basically doing the conversion from gherkin 2 format to gherkin 3 format again), which would be used to wrap the provided Keywords instance passed to the lexer. But we would also support passing directly a dialect provider directly to the lexer instead, and we would ship an implementation consuming a file containing directly the new format (either a json provider loading the upstream gherkin-languages.json file directly or a provider loading a file returning a PHP array, as today). The usage of the Keywords API and its compat layer would of course be deprecated.

what do you think @ciaranmcnulty ?

ciaranmcnulty commented 3 years ago

Seems to make sense to me, it's not something I'd be too confident about doing myself