Closed ivu-mawi closed 2 weeks ago
Hey, thanks for the issue.
This has already been asked by someone else before, but at the time I could not think of a good way to consistently do this. If we can ensure that timestamp data for protobuf sources remains consistent when being unpacked from sources like existing archives or dependencies, then this sounds like a good idea and something we should definitely support. We can also hash the files instead as suggested potentially, although we'd need to also keep some kind of persistent file listing so that we know if new sources have been added.
Incremental compilation of subsets of the sources may be a little more complex and require more thought.
Would this be something you'd be interested in working on? If not, it might take me a couple of weeks to be able to pick this up as my schedule is a bit packed at the moment.
I'll see if I can get my employer to allow me to work on this :)
@ivu-mawi I have some time this afternoon, so will crack on with this if that is okay with you!
Just an update that I am still working on this. I've had to move some stuff around which cascaded slightly but it should make this a little easier to look into now on my side.
Sorry for the wait!
@ivu-mawi hey. I've pushed some new code to support this on the feature/GH-438-incremental-compilation
branch.
Right now, it is using SHA-256 digests of all source and dependency files. I don't have a big enough dataset to see what difference it makes between setting <incrementalCompilation>
to true
and false
though. I'm somewhat wary that it may be too slow on large file sets, so I could do with some benchmarking from your side to see what sort of difference it makes.
If all goes well, I should be able to merge this once I've tidied up after myself. If it doesn't create a meaningful improvement for you then it may be time to go back to the drawing board...
Wow, thanks for your effort! I'll try to test this as soon as I can.
In the meantime I dug a bit deeper into how maven compilation works. One problem is that as soon as maven detects any changed java
-source-file, it recompiles all sources files (per-module I think), instead of just those that have changed. I think this is because the maven developers thought it would not be worth the effort to come up with an algorithm to determine which files need to be recompiled, as the java compiler is "fast enough"...
However, the protobuf-to-java compilation is a separate step, and I think skipping that when possible is still good.
Going to merge this for now and any issues can be followed up in a subsequent PR!
@ivu-mawi I've had a couple of other reports saying this is working for some people as expected, so for now I'm going to release this since it is sitting on a lot of internal changes.
If you get any issues, please can you raise a new issue? Thanks! 🙂
Running both the protoc compiler and then the java compiler takes a non-negligible amount of time, depending on the size and amount of your
.proto
files. I would like to have the fastest possible edit-rebuild-debug cycle.Even when not actually changing any code, this plugin always recompiles all proto files, so we can not take advantage of incremental rebuilding as needed.
We currently use https://github.com/os72/protoc-jar-maven-plugin and it provides a config property very non-aptly named
optimizeProtogen
which is true by default and skips runningprotoc
if the last build time is more recent than any of the source .proto file last-modified-timestamps. This in turn allows maven to skip runningjavac
in many cases.I'd love some similar mechanism in this project. I'm not sure if the timestamp comparison from the linked project is the best way. Maybe a hash over the .proto file contents and the used protoc compiler version would be the better. (In contrast to the is-timestamp-newer-check, a hash-check would correctly detect downgrade of a .proto file to an older version as needing a recompilation. It would also keep working if the original timestamp of any proto file gets lost somewhere along the way. But the hash could potentially be much slower than timestamp checking.)