feat: `gethclone` binary

ARR4N commented 2 weeks ago

Why this should be merged

From x/gethclone/README.md:

This is an experimental module for tracking upstream go-ethereum changes. The approach of git mergeing the upstream branch into subnet-evm is brittle as it relies on purely syntactic patching. gethclone is intended to follow a rebase-like pattern, applying a set of semantic patches (e.g. AST modification, Uber's gopatch, etc.) that (a) should be more robust to refactoring; and (b) act as explicit documentation of the diffs.

How this works

A subset of geth packages is specified via the --packages flag, retrieved with Go's native package resolution. The import graph is traversed for all other ethereum/go-ethereum packages, and (as yet undefined) AST patches are applied. The --output_go_mod flag specifies the go.mod file of the module to which the transformed files are to be written.

See examples of intended future use in comment below.

How this was tested

The astpatch package has tests to demonstrate proper AST traversal. As no real patches exist yet, this only records and confirms visited nodes.

The gethclone binary itself is not yet tested. The intention is that the cloned files themselves will have tests to demonstrate proper transformation. All *_test.go files will be cloned and run. Modifications (e.g. addition of struct fields) will have tests defined in the destination module. Ideas for more specific tests are welcome.

How is this documented

READMEs and Go comments.

marun commented 2 weeks ago

Interesting. For mechanical changes like imports or struct modifications this seems like a very promising approach. Might this approach also be appropriate for more complex fork maintenance (i.e. maintaining our changes at the AST level rather than as git commits)?

ARR4N commented 2 weeks ago

Interesting. For mechanical changes like imports or struct modifications this seems like a very promising approach. Might this approach also be appropriate for more complex fork maintenance (i.e. maintaining our changes at the AST level rather than as git commits)?

I think it would form part of a broader strategy. Changes like you describe, as well as deletions and other "simple" modifications can be easily implemented as an AST change. Additions of entire methods/functions should just be done as new files already in the destination module (i.e. regular Go code). More complex code changes will be addressed as we encounter them.

Cloned+transformed files should probably still be checked in so others can import the package, but with pre-merge checks to confirm that they're up to date with gethclone output (e.g. with check-clean-branch.sh).

Advantages of maintaining the changes as AST and in-situ files:

Explicit record of the differences compared to geth
Robust (hopefully) to refactoring of geth

The primary disadvantage is the complexity of writing an AST patch although I plan on writing some generators; that would be used something like:

patches.AddStructField(geth("core/types"), "StateAccount", &ast.Field{..."IsMultiCoin"...})
patches.RemoveMethod(geth("common"), "Address", "Cmp")

marun commented 2 weeks ago

No argument that the current approach of merging changes without shared history is not ideal, but how would you compare/contrast the AST approach proposed in this PR with maintenance of a proper fork with shared git history for which sets of persistent and transient carry patches are maintained?

ARR4N commented 2 weeks ago

No argument that the current approach of merging changes without shared history is not ideal, but how would you compare/contrast the AST approach proposed in this PR with maintenance of a proper fork with shared git history for which sets of persistent and transient carry patches are maintained?

Context for others as I didn't know about transient/carry patches: https://github.com/openshift/kubernetes/blob/master/REBASE.openshift.md

As for an answer: I'll get back to you @marun because I don't know yet.

ARR4N commented 2 weeks ago

@tsachiherman:

I think that I would prefer to have the config struct defined in it's own file and leave the main.go to be the light-weight entry point for the executable.

Moved into gethclone.go

ava-labs / subnet-evm