Finding the right way to provide the PHP implementation

stof commented 9 years ago

This issue is meant to continue the discussion started in https://twitter.com/everzet/status/575285360403611648 in a better place than Twitter /cc @everzet @cyaranmcnulty

The current Gherkin parser available in PHP is https://github.com/Behat/Gherkin/. I see 2 different ways to provide the gherkin3-compatible parser for PHP:

keeping behat/gherkin as parser for PHP, but making it run the same acceptance tests than the cucumber/gherkin3 parsers (see #3 which was opened for this reason)
generate a PHP parser in this repo (using a different PHP namespace than Behat given that behat/gherkin is already at version 4 anyway and the generated library would probably not be a drop-in replacement)

The first solution means that all the maintenance of the PHP parser stays on the shoulders of the Behat team, but the acceptance tests would ensure consistency. The following concerns the second solution

Composer support

The PHP parser MUST be available as a Composer package registered on Packagist (otherwise everyone will hate the library, starting with the Behat team). However, Composer does not play well with repositories containing code for multiple languages:

it expects to have the composer.json file at the root of the repository (not that annoying given that it is a single file, but still inconsistent with other languages)
Composer will then download the Github archive (or clone the repo depending on the user preference) to get the PHP code, meaning PHP users will download lots of garbage code

A way to handle this is to have the generated PHP parser in a repo containing only the PHP library. There is 2 ways to achieve this:

generating it in this repo and then maintaining a subtree repo, which will be used by Packagist. This requires some external server responsible for updating the subtree split automatically on each push to the main repo
generating it only in the gherkin-php repo. This requires to think about regenerating this one when updating the parser definitions in the main repo, or having an automated update of the gherkin-php repo. It makes the maintenance work bigger when updating parser definitions here
Impact on Behat

Currently, the Behat\Gherkin node classes (representing the AST) are exposed to userland code (especially the PyStringNode and the TableNode). So Behat itself would have 2 choices to use the new parser:

break backward compatibility by dropping behat/gherkin entirely and making userland code use the AST objects generated by the new parser (and then hoping that the cucumber follows semver properly on the gherkin parser so that changes here don't break BC for Behat users by mistake).
keep the node classes of behat/gherkin and implement a converter from the new AST objects to the behat/gherkin ones, to use it after parsing a file. This solution is much better IMO, because it means that Behat users would not have to know about the parser change at all (except in case the current behat/gherkin parser has a different parsing behavior of course, but this would happen in any case)

@everzet @aslakhellesoy What would be the preferred approach for the parser itself (for the impact on Behat itself, I think the best solution is quite clear)

aslakhellesoy commented 9 years ago

My preference would be a cucumber/gherkin-php repo which could be linked as a git submodule by the cucumber/gherkin3 repo.

(Later we'll rename cucumber/gherkin3 to cucumber/gherkin and the old cucumber/gherkin to cucumber/gherkin2).

This means cucumber/gherkin-php could be consumed (using composer) completely independently of cucumber/gherkin3. The main reason for linking cucumber/gherkin-php as a submodule would be:

If we make a change we can easily run tests for all platforms
The master dialects.json lives in one place (copied to gherkin-php submodule and committed there as part of the build).

Now, in terms of how to integrate it with Behat (or any other Cucumber implementatin for that matter): None of the internals of Gherkin3 are intended to be exposed to the user-visible API. To be specific - the Scenario, DataTable and DocString AST nodes from Gherkin3 should not be exposed to users. Nor should the TestCase and TestStep nodes resulting from the compiler.

Each Cucumber implementation should only expose its own API, wrapping whatever objects come out of Gherkin3.

stof commented 9 years ago

OK, so we would keep Behat/Gherkin as being our own wrapper around the Gherkin parser.

aslakhellesoy commented 9 years ago

Well, at least the parts of the Behat/Gherkin API you have already exposed and people are using. The implementation would probably be quite different. There are significant simplifications to be made by basing Cucumber/Behat on test cases (the output from the Gherkin3 compiler).

aslakhellesoy commented 9 years ago

Longer term - I think you should try to migrate users toward an API where they don't depend on any Gherkin APIs (whether it's Gherkin3 or the old Behat/Gherkin). Gherkin is an implementation detail.

In Cucumber, we're planning to support other formats than Gherkin (Markdown for example), and just have a different compiler that would compile that down to the same kind of test cases that Cucumber understands.

Not exposing Gherkin also gives us much more freedom in changing that part of the stack without breaking user code.

The parsing and execution are two different bounded contexts, using DDD terms :-)

aslakhellesoy commented 9 years ago

Here is a good example of how I modified 3 parsers in one commit: 32e8e6e276eb5bfe59919ff7e58e13bb5a3c0564

So - being able to do that with a future gherkin-php would be great (even if it would be an additional commit in a submodule).

aslakhellesoy commented 9 years ago

@stof (/cc @everzet) - IIRC you wanted better documentation of the AST. I added it to the README.

aslakhellesoy commented 9 years ago

I'm closing this ticket - I think we've figured out what needs to be done. If you choose to contribute a PHP implementation it would live under cucumber/gherkin-php and use subtrees - ref #13.

If you have specific concerns, please create a new ticket.

everzet commented 9 years ago

I believe subtrees resolve the potential issue :)

cucumber-attic / gherkin

Finding the right way to provide the PHP implementation #4

Composer support

Impact on Behat