dave / forky

A tool to automate forking and modifying codebases
8 stars 1 forks source link

Converting the compiler to a library: Progress as of 19th July #1

Open dave opened 6 years ago

dave commented 6 years ago

I'll give you a little update on my project to turn the Go compiler into a library... I've made a lot of progress but it still seems like an impossible task to get it working. Here's some of the details as of today:

It reads from $GOPATH/src/go.googlesource.com/go (I'll probably change this later because having a copy of the whole stdlib in my gopath is playing havoc with my editor code completion). It performs the mutations listed here, then saves the output at $GOPATH/src/dave/golib.

First here it deletes all the code that we don't need... I'll probably make this a bit more intelligent at some point so you just have to list so many items.

Next here is changes all references to the package paths we're forking - this works great and operates on all string literals in the codebase - not just import statements.

Next here we disable a bunch of tests that don't work (for various reasons).

Next here there's a little kludge to delete a node that is relying on a go1.11 feature... I think I can remove this kludge when 1.11 drops.

If you stop at this point and save the codebase, then all the tests pass! But we really haven't done that much. Next here is the Libify step which is where most of the juicy stuff happens.

Libify does a bunch of things. The most obvious is that it collects all the package level vars and adds them to a PackageSession struct. So this becomes this and this.

Next it scans the bodies of all functions and methods, and works out which need access to the PackageSession. Any function that needs access gets a receiver added and becomes a method of the PackageSession type, so this becomes this. Any reference to one of these functions (or vars) gets converted into a selector using the local psess variable, like this.

Any method of another type that needs access to PackageSession has it added as a parameter, like this. This needs a re-think, because changing the signature of methods means we stop satisfying interfaces (more about this later).

Calling a public function or accessing a public variable in another package is accomplished by keeping the PackageSession for all imported packages in the local PackageSession, like this. Any time they are accessed, the are wired up like here and here.

There's plenty more work needed before this will even compile. As I mentioned, we can't rely on injecting the PackageSession into methods because we break interfaces. See this issue for more details.

One big optimization that I'm currently missing... Most of these package-level variables are never modified after initialisation. If we're sure they're never modified after initialisation, they can stay as package-level and don't need to be stored in the PackageSession. This is something that could possible be detected using the ssa/pointer analysis packages... Not something I've used before so would love some help. I posted a stackoverflow question about this yesterday here. If you have any input, discuss in this issue.

hajimehoshi commented 6 years ago

Awesome work!

One question: does forky.FilterFiles remove the matched files, or permit only the matched files?

dave commented 6 years ago

Thanks!

FilterFiles deletes anything that doesn't match... https://github.com/dave/forky/blob/ex1/forky.go#L97-L103

hajimehoshi commented 6 years ago

Thanks, Filter can be interpreted in either meaning, so I was confused.