Monorepo clusters / Testing integration of connected libraries

s1na commented 5 years ago

Due to the tight coupling of libraries, modifications in one could surface errors in other dependent libraries. Currently errors are found only after the library in question is released and other libraries update their dependencies. I think it'd help a lot if integration errors would surface during development and they'd become easier to find/fix.

Maintaining a monorepo of some of the highly interconnected libraries (VM, blockchain, block, etc.), as it is being discussed in https://github.com/ethereumjs/organization/issues/27, would enable running tests across libraries during development and would also facilitate cross-library changes.

Another approach would be to write a more complex circleci workflow which fetches all relevant libraries and tests them against each PR. Not sure how such a workflow would look like.

I'd be happy to hear your thoughts on these two approaches, or other potential ones.

holgerd77 commented 5 years ago

I have over time shifted to a position where I think it makes sense to create some well-selected monorepo clusters to combine some repositories where there is lot's of interoperability and need for integration tests.

Cluster Candidates might be:

VM (in `ethereumjs-vm`)

VM
Blockchain
Block
Transaction

Maybe in the future also some separation of the whole VM and just the execution engine.

State Cluster (in `merkle-patricia-tree` ?)

Thought about proposing a monorepo for the VM for some time now with the state manager and account repos, but maybe this makes more sense to cluster with the merkle tree lib like:

Merkle Tree
State Manager
Account

Utility Libraries

This would be more to have this in one place, not completely sure if the combination makes sense though:

ethereumjs-util
RLP
units
icap

Not sure, what is your general stance on this? And does these clusters make sense? Should very much be some longer discussion, would like to hears as many opinions as possible on this.

//cc @vpulim @mattdean-digicatapult @whymarrh @axic @alextsg @danjm @krzkaczor

axic commented 5 years ago

I have the following four quick comments:

I don't have a strong position on the VM. It already includes a lot of non-VM code.
I don't think it is a good idea to merge the lower-layer merkle tree with the upper layer representation (state, account).
One of the reasons utilities were split was due to our inexperience in how to efficiently split up code when bundling for the web. We wanted to avoid pulling in a lot of dependencies for the web. One of the goals was to have Metamask in <2Mb for Chrome extension limits. Now we know that by loading parts of libraries solves a lot of this (e.g. require('ethereumjs-wallet/thirdparty').
I'm not fully sold RLP should be merged in, but since it is really Ethereum specific it may make sense. Otherwise I'd really keep it separate.

mattdean-digicatapult commented 5 years ago

Sorry for the slow reply @holgerd77.

I'm honestly not convinced by the mono-repo argument in this case. To me the best reason for a mono-repo structure is when you have modules that you might want to make separate modules at some point, but structurally you don't want to commit to a specific structure of modules early on in development.

I'm not sure I understand the testing argument here. There's nothing to stop us adding integration tests for some of the modules which have many dependencies and where coupling is maybe tighter than ideal.

On the specific ideas you've outlined

On the VM @axic is right, there's a lot of non-VM stuff there now and it probably needs a bit of a reorg at some point. I'm just not sure adding more stuff there is really going to help.
I'm not a fan of merging the lower-level state representation with the higher level Account either
The utilities I'd be fine with being merged.

s1na commented 5 years ago

@mattdean-digicatapult Thanks for the comments.

I'd first like to point out that testing is not the only argument, there are some other in #27, and various others in the wild [1, 2].

Regarding testing: yes, it's possible to write integration tests for a library which tests it against its dependencies. What I was referring to was the reverse situation. E.g. a PR modifies merkle-patricia-tree and all tests pass, but when released, it's found out that VM breaks when using the newest release. My argument is we need a method to test the dependents (not dependencies) of a library after modifications.

The goal with a mono-repo would not be to have a monolith library, but rather to break it into smaller packages (each published separately on npm), but they reside in the same git repository (E.g. truffle). I agree with you completely that VM needs a re-org, I just think the reorg doesn't need to introduce even more git repos, which complicates development, testing, cross-project changes, etc.

alcuadrado commented 5 years ago

I've been a big fan of monorepos since I spent some time at Google, and convinced multiple projects I worked on to migrate to one, and the result has always been positive.

None of those migrations happened without some resistance, as it initially looks like a TERRIBLY BAD idea. The notion that a repo represents a single project/package is rooted very deeply in the software development folklore, and even some products (e.g. github) are somewhat developed around that.

A repository is just a tool for share files and coordinate changes to them. A project/package is part of the semantics of those files, so you are free to set them up as you want. There's no need to merge all the packages into a single one to have them in the same repo.

With that being clear, one should analyze the pros and cons of using a monorepo in the context of ethereumjs. These are the once I see:

Advantages:

@s1na's point on testing is very real. A monorepo lets you spot breaking changes as soon as they happen by running all of the org's tests. If I make a breaking change in package A, and package B, which uses A, breaks because of this change its tests should fail. This can be implemented in travis/circle without a monorepo, but that's hard to maintain, and having the ability to run that locally is also super useful.
With a monorepo is WAAAY easier to make cross-organization changes (e.g. changes in tooling/config). This takes multiple PRs now.
It's much much easier to share code and config files. There's ethereumjs-config now, but with a monorepo it's easier to do it and keep it in sync.
Easier to get the whole picture of ethereumjs. It's not easy at all now, I've recently struggled with it. Having the entire thing cloned at once and being able to read different parts is very valuable. Cloning everything now is painful.
Easier to get what's going on in the whole organization. For example, it's hard to know what to work on now.
Easier to keep dependencies in sync. For example, you can run a super simple script that ensures that on the CI. I recently implemented this here.

Disadvantages:

The issues can get messy, as all of them would live in the same repo. It should be manageable by using tags or prefixing their titles.
Merging a breaking change into a monorepo breaks everyone's build, so you have to be very strict about it. One way of handling it is by fixing (possible in the same PR) the projects that are consuming the one with breaking changes.
The initial clone & setup takes more time, especially if you are only interested in a single package.
A monorepo needs a little more documentation about how to work with it.

Finally, I recently migrated Buidler to a monorepo with lerna, and I'm happy to share my experience doing that.

holgerd77 commented 5 years ago

Hi @alcuadrado thanks for this additional write-up, for the moment we are relatively settled with our current setup. We had this monorepo discussion for quite some time now so please bear with me if I don't pick this up again for now.

We recently introduced the ethereumjs-config repo as some result of the discussion process. We will also likely use targeted monorepo approaches, a main one being the split up of the current VM modules (vm, evm, bloom, statemanager) into different packages (and run it as a lerna repo). This might be part of the v4 release process, but this is not decided yet.

alcuadrado commented 5 years ago

We had this monorepo discussion for quite some time now so please bear with me if I don't pick this up again for now.

No problem. This is not a minor decision, and the migration would be a whole lot of work. Just wanted to share my opinion in case it could be useful.

We recently introduced the ethereumjs-config repo as some result of the discussion process.

I didn't know why nor when ethereumjs-config was added. This makes sense.

holgerd77 commented 4 years ago

Main monorepo transition has been decided upon and executed along https://github.com/ethereumjs/ethereumjs-vm/issues/561 with the creation of the VM monorepo. If other demands for one or more additional monorepo integrations occur it will likely be more productive and targeted to discuss along a newly created issue. I also regard my suggestions from the comment above and outdated to a greater extend (may still serve as some thought inspiration though).

Will close.

ethereumjs / organization