cucumber / common

A home for issues that are common to multiple cucumber repositories
https://cucumber.io/docs
MIT License
3.36k stars 694 forks source link

Split monorepo up into several repos #1724

Closed aslakhellesoy closed 2 years ago

aslakhellesoy commented 3 years ago

Is your feature request related to a problem? Please describe.

There are several problems with the cucumber/common monorepo:

Describe the solution you'd like

I want to split cucumber/common up into multiple repos:

The current build system has complex functionality that we'd have to replace:

Also see discussion in Slack

Describe alternatives you've considered

We could probably push ahead with #1720 and make the current monorepo serial build run in a few seconds (by leveraging a cache in the cloud), but the build process would still be complex and brittle. Newcomers would still be intimidated by size and complexity of this huge repo.

Additional context

In 2015 the Cucumber implementations had diverged and behaved inconsistently. Each release made them more inconsistent. To mitigate this we decided to bring all the Gherkin implementations into one repository, using a shared acceptance test suite.

This worked well, so we continued with the same approach for new libraries such as Cucumber Expressions and Tag Expressions - in the same repo.

Building and in particular releasing libraries in 10 or so languages is complicated, so we built an "orchestration" build system with Make that makes the build process consistent across the increasing number of libraries.

Fast forward six years, and we have a build system with fangs, tentacles and worts. The build system wasn't designed with parallelism in mind, which is why it takes 1h.

TODO

mpkorstanje commented 3 years ago

I'll move datatable over to cucumber-jvm.

aurelien-reeves commented 3 years ago

Maybe we could do that in conjunction with https://github.com/cucumber/common/issues/1614?

aslakhellesoy commented 3 years ago

While it is possible to bring commit history when moving from one repo to another, I suggest we don't do it because it's tedious to do. People who need to look at the history will find it in this repo.

jamietanna commented 3 years ago

We can retain the history for the new repos, for just the subtree that's being imported. I'd much prefer, as a consumer of Cucumber, to be able to view the history in the new repo, rather than jumping around.

I've done it before using Option 3 in https://stackoverflow.com/a/30386041 and it works nicely.

aurelien-reeves commented 3 years ago

Thanks for the info @jamietanna!

If we have the possibility to keep the history, that would be great indeed!

aslakhellesoy commented 3 years ago

Ok, I'm sure we can do that :-)

aslakhellesoy commented 3 years ago

I'm proposing we start by creating the following new repositories:

This will hollow out about half of the monorepo. Eventually I would like to have everything moved out so we can retire the monorepo, but let's start with this.

aurelien-reeves commented 3 years ago

cucumber-expressions/test-data may be tricky to move. Do we have a plan for shared test-data, CCK, and related?

Beside that, looks good πŸ‘

aslakhellesoy commented 3 years ago

cucumber-expressions/test-data may be tricky to move

Why? I'm proposing we move it along with all the implementations so the directory structure will be like this:

.
β”œβ”€β”€ go
β”œβ”€β”€ java
β”œβ”€β”€ javascript
β”œβ”€β”€ ruby
└── testdata
aurelien-reeves commented 3 years ago

Oh, testdata here are not synced from another package πŸ˜…

Sorry for that. So yes, looks good πŸ‘

aslakhellesoy commented 3 years ago

Do we have a plan for shared test-data, CCK, and related?

We have two kinds:

mpkorstanje commented 3 years ago

@aslakhellesoy I'll move datatable to cucumber-jvm.

mpkorstanje commented 3 years ago
git remote add common git@github.com:cucumber/common.git
git fetch common
git checkout -b merge-datatable common/main 
git filter-branch --subdirectory-filter datatable
git merge origin/main --allow-unrelated-histories 
git push 

Looks like this worked for me. But notice the big disclaimer. Probably good to follow it.

aurelien-reeves commented 3 years ago
git remote add common git@github.com:cucumber/common.git
git fetch common
git checkout -b merge-datatable common/main 
git filter-branch --subdirectory-filter datatable
git merge origin/main --allow-unrelated-histories 
git push 

Looks like this worked for me. But notice the big disclaimer. Probably good to follow it.

Which disclaimer?

aslakhellesoy commented 3 years ago

@mattwynne and I experimented a bit today, trying to create a new (local) repo for cucumber-expressions. We used this:

brew install git-filter-repo
mkdir cucumber-expressions
cd cucumber-expressions
git init
git remote add common git@github.com:cucumber/common.git
git fetch common
git checkout -b tmp-migrate common/main
git filter-repo --subdirectory-filter cucumber-expressions --force
git branch -m main

We also talked about work to do after that:

Cleanup

Not doing

Elsewhere

...to be continued...

aslakhellesoy commented 3 years ago

I wrote a gist based on the experiments @mattwynne and I did a couple of days ago: https://gist.github.com/aslakhellesoy/3cb73d9b69c28b497710b78baf0d3ec5

It seems to work well creating a new cucumber-expressions polyglot repo:

curl -s https://gist.githubusercontent.com/aslakhellesoy/3cb73d9b69c28b497710b78baf0d3ec5/raw/8ff5651126ae6eb7ae5240bcac39ec01744a6cc5/make-polyglot-repo.sh \ | 
  bash /dev/stdin cucumber-expressions

Any suggestions/feedback before we push this as a new repo and remove cucumber-expressions from cucumber/common?

aurelien-reeves commented 3 years ago

As far as I can tell, it looks good πŸ‘Œ

aslakhellesoy commented 3 years ago

I have created https://github.com/cucumber/cucumber-expressions

Here are some more notes on what needs to be done to finish the work (we can reuse this checklist for other moves)

Push new repo

After creating a local repo:

Configure Renovate

Configure Repo

Set up CI

Cleanup

Migrate documentation

Make a release

This will require some more work since the release scripts are not migrated over from the common repo, and we need to rethink how it's done. It should be simpler! /cc @mattwynne

aurelien-reeves commented 3 years ago

For the CI, I did not use any kinda makefiles. I've directly written git workflows.

That seems fine for cucumber-expressions as all the tests are easily executed from commands like npm test, bundle exec rspec, go test ./... and mvn test.

And we are actually working on some release process throw git workflow too.

So, maybe the Makefile could be greatly simplified to be used to run the docker container, and eventually some global clean tasks?

aurelien-reeves commented 3 years ago

On debian linux, we had to tweak the script (https://gist.github.com/aslakhellesoy/3cb73d9b69c28b497710b78baf0d3ec5) a little bit:

# Delete all other tags
- git tag | grep --invert-match -E '^go/v\d|^v\d' | \
+ git tag | grep --invert-match -E '^go/v[0-9]|^v[0-9]' | \
  xargs -n1 git tag -d
# Modify CHANGELOG.md links. Remove the '' if not on MacOS. 
- sed -i '' "s|${name}-||g" CHANGELOG.md
- sed -i '' "s|${name}/||g" CHANGELOG.md
- sed -i '' "s|https://github.com/cucumber/common/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md
- sed -i '' "s|https://github.com/cucumber/cucumber/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md
+ sed -i "s|${name}-||g" CHANGELOG.md
+ sed -i "s|${name}/||g" CHANGELOG.md
+ sed -i "s|https://github.com/cucumber/common/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md
+ sed -i "s|https://github.com/cucumber/cucumber/compare|https://github.com/cucumber/${name}/compare|" CHANGELOG.md

We also had to make sure to use git version >= 2.22

mattwynne commented 3 years ago

Create-meta is done!

mattwynne commented 3 years ago

@aslakhellesoy any thoughts on which package to tackle next? @aurelien-reeves and I discussed this a bit today on voice, but we don't have a clear plan as yet.

It feels like the CCK, gherkin and messages are the big ones that remain.

aslakhellesoy commented 3 years ago

I propose gherkin, then messages, then cck.

When we move out Gherkin we should get rid of make too, which means replacing the make based tests with unit testing tool tests. These tests will be much faster (no executable to launch for each doc), and also easier for contributors to run (they’ll use the conventional testing tool).

The cucumber-expressions and create-meta repos already use this technique, have a look at that. The gherkin/elixir tests already use this approach.

davidjgoss commented 2 years ago

Have we discussed how to deal with formatters in this split yet?

I was thinking about splitting out html-formatter. I think we've previously discussed the idea of having formatters (or at least terminal-focused formatters) together, but I think the html one is definitely an outlier in that one implementation (javascript) is depended on by the others which is not the normal pattern, so it should perhaps be its own repo.

It does have a dependency on @cucumber/react but I think the API surface used by the formatter doesn't change very often so that should be okay. What do we think? We can also look at switching from webpack to esbuild while we're at it :)

mattwynne commented 2 years ago

@davidjgoss yeah I hadn't thought about it too hard yet, but I agree the the html formatter is definitely something that could do with standing alone. I wouldn't be averse to moving the @cucumber/react module along with it if that will make it easier to change.

mattwynne commented 2 years ago

Let's use this project to track progress from now on.