Consider inverting the messages dependency

mpkorstanje commented 3 years ago

Currently everything depends on messages.

This happens partially because messages contains the entire domain.
And partially because we use messages as the Gherkin AST.

This is not great because:

We have a serialization format at the core of our domain rather then in an adaptor.
Because the Gherkin AST is the serialized format we can not utilize any patterns that make working with ASTs easier.
When anything in the domain changes, everything has to be updated.
Messages depends on Jackson. Due to it's nature as a de-serialization library Jackson gets frequent security updates.

As a visual aid (note note quite accurate, I was sketching out something else):

As such I would propose inverting the dependency and splitting messages into several smaller submodules. The modules are already hinted at by the comments in the proto file.

https://github.com/cucumber/common/blob/f96daddbedd679969bee1609ac9a71e857752720/messages/messages.proto#L13-L43

By inverting the dependency:

We push the serialization format out to the edges of the architecture. This allows for proper ports and adaptors design. Only root dependents (e.g. the Cucumber CLI) would have a dependency on messages.
The impact of updating external dependency changes is limited to the root dependents (e.g. the Cucumber CLI).
Changes that impact Gherkin won't impact execution related code.and vice-versa.

Now I imagine the biggest perceived obstacle will be additional copying the domain objects into messages. However for example if the Gherkin AST was modelled as a tree of nodes using the visitor pattern it would be relatively straightforward to convert it to a message object.

mattwynne commented 3 years ago

I agree with the thrust of these observations and the proposal. I want to think a bit more about whether I agree that we need to model this as multiple subdomains (parser, compiler, glue, execution, results) in separate modules or whether those stable types at the core of our domain (and, as you say, not depend on things like Jackson) could live in a single module.

laeubi commented 3 years ago

But isn't it the idea of Jackson and similar frameworks to have domain objects that could be (optionally) be serialized, just to prevent creating additional domain objects for serialization? The consequence of your proposal would be that we have some kind of layer above messages? I'm just asking because Cucumber Plugins already has such a layer but it does not contains all information there and changes seem to be hard because of the discussion about what should be API or not, so in the end I ended up using the raw messages as they are offering the greatest flexibility. This leads me to another point where I think if this is done and should be useful for different consumers (IDE, CLI, ...) the parsers and converter has all to be public accessible.

mpkorstanje commented 3 years ago

@mattwynne a single library would work too for the messages. As long as we don't treat the messages as our model.

mattwynne commented 3 years ago

@mpkorstanje I agree about not treating the messages as our model, they're just DTOs. What your post has given me pause for thought about is whether we should try to define a shared core domain model/models for Cucumber though. Like that maybe messages is just filling a gap right now, that should actually be filled by a proper domain model.

mpkorstanje commented 3 years ago

Ah. Yes you are definitely right that the messages are filling in a gap of domain objects. However we should not introduce a monolithic domain. Rather each module has it's own public API and only uses another modules public API. The domain objects can be part of this public API. For example:

In this case we have two modules. A compiler and parser module. The Parser, Compiler, AST and Pickle classes are all part of it's public API. The AST and Pickle are also domain objects.

Internally each module can use patterns such as "Ports and Adapters" but I don't think this pattern is efficient for small libraries that do not have any input/output. Rather this is something you'd apply to utilities that use these libraries such as the Gherkin CLI because these have to deal with input and output where as the parser and compiler modules do not.

For example:

Now this brings us to the next problem. The current implementation has to deal with input and output just to pass the shared test suite. This was a pragmatic choice, but it also means that as a module gherkin drags in more then needed.

cucumber / common

Consider inverting the messages dependency #1614