com-lihaoyi / mill

Mill is a fast JVM build tool that supports Java and Scala. 2-3x faster than Gradle and 5-10x faster than Maven for common workflows, Mill aims to make your project’s build process performant, maintainable, and flexible
https://mill-build.org/
MIT License
2.06k stars 337 forks source link

Target-level incremental re-building when build.sc changes #2348

Closed lihaoyi closed 1 year ago

lihaoyi commented 1 year ago

The goal of this ticket would be to make it so that changing build.sc code does not aggressively invalidate all target caches, and instead can invalidate only the ones downstream of the code change. This would make Mill more suitable for larger projects and monorepos, where invalidating the whole world becomes an increasingly expensive operation as the size of the codebase grows.

This global invalidation is somewhat mitigated by Zinc's incremental compilation, but that only applies to Scala targets and not any of the other things Mill builds. Duplicating it for all target types, even ad-hoc user defined targets, is probably unfeasible

This is probably not as difficult as it seems:

  1. SBT already has this functionality for re-running tests on changes, making use of the analysis files generated by the Zinc incremental compiler.

  2. The com.lihaoyi:::acyclic plugin takes another approach, but also involves dumping the fine-graph code-level dependency graph during compilation

It is unlikely we will be able to catch all dependencies, e.g. if someone is mucking around with shared mutable state we can't see that, but that's already a limitation in the current Mill approach. Assuming the code lacks shared mutable state, we should be able to invalidate things as precisely in response to code changes as we currently do in response to T.input changes

Roughly, that would involve the following steps:

  1. Replace Ammonite script running with a MillBuildModule, to compile the build.sc using Mill itself, to unlock access to Zinc and incremental compilation. This would also put everything into one compilation unit, simplifying things if we choose to go with the compiler plugin approach, although not strictly necessary

  2. Using either approach above (SBT or Acyclic), make the classpath hash that we feed into the target cache keys only dependent on the external classpath, and use the internal graph/hashes computer from Zinc or our own compiler plugin to provide fine-grained invalidation for changes

  3. (Optional) do bytecode analysis to allow fine grained invalidation of external classpath changes too (probably not worth the hassle, but it's an option if we choose to do so)

CC @lefou @lolgab, WDYT?

lefou commented 1 year ago

Does that mean, a changed build.sc wide variable will only affect those targets who use it (as well as their downstream users), all other targets do not need to recompute? This would be great!

lihaoyi commented 1 year ago

that's right. Even if the change was in a helper function or something, we could crawl the call graph and figure it out. After all, that's what SBT does for testquick, and what acyclic does to find inter-file circular dependencies

lolgab commented 1 year ago

We already have caching which works at build file level. What we do is to get the class where a target is defined and we get an identifier for the build file from that class name. Then we use the file imports tree from ammonite to build a hash for the script and its dependencies.

Since we don't know dependencies between classes from Ammonite (haven't investigated if it's already possible to do that), we remove any inner classes from the identifier so we can treat a script as an atomic thing.

Having said this, I hope that we can build on top of the existing functionality, maybe by increasing the granularity of the current working solution from file-level to class-level. To do that we need to stop removing inner classes from identifiers and have a way to construct the importTree that keeps dependencies between classes instead of just dependencies between files.

lefou commented 1 year ago

@lolgab It's not clear to me, whether you argue against this proposal or just try to provide an implementation idea based on Ammonite.

I'd also like to use existing mechanisms to deduce code dependencies.

I haven't inspected async in this context, so I can't tell much about it, but I assume Haoyi is knowing it very well. The idea to use zinc analysis sounds even nicer to me, as there is already some API and tooling to produce it. But I have zero experience using said API to get the information we might want.

The use of zinc analysis requires a different build process though, hence @lihaoyi suggested the MillBuildModule approach, which isn't ready to use yet. But a switch to build Mill with Mill is kind of eating the own dog food, we better control what we need without much additional cost, as we maintain a ScalaModule anyways. We would immediately profit from incremental compilation. And when included files are disentangled from each other (in contrast to current Ammonite script compiler approach), the whole foreign module story would profit, as we even might be able to use the cache locations of the foreign modules directly, which would reduce the amount of redundant build work and cache storage even further.

lihaoyi commented 1 year ago

I think long term Ammonite scripts are a dead end: they support a lot of dynamism nobody actually wants, and in exchange cannot support incremental-compilation/pre-compilation/packaging/etc. that people actually need. They also duplicate a lot of what Mill does: dependency analysis, file hashing, invalidation, etc. but in an ad-hoc manner.

Long term I believe we should replace them with Mill's own bootstrap scriptrunner or with ScalaCLI. From a dependency hygiene perspective, I'd try the Mill approach first. Not having to debug two different build servers, deal with two separate configuration frameworks, two sets of business logic libraries, etc. would be a huge advantage over both the status quo or going with ScalaCLI

lolgab commented 1 year ago

@lefou I'm all in for this proposal! I should have said clearly! What I wanted to say is that we don't need to start from scratch but provide a class-level dependency tree (which might come from the MillBuildModule and zinc, but I don't know how it would work) instead of the file-level dependency tree taken from Ammonite today. Once we do that, most of the existing implementation can probably be reused.

lefou commented 1 year ago

I had trouble in the past with undercompilation in zinc. This is especially relevant when macros are used, which is the case in Mill. Beside using zinc in MillBuildModule, I think if we can get some kind of source dependency tracking in pure Scala code (like it is done in async) I would have more confidence in it's accuracy. This also means we are less coupled to zinc internals.

When building projects, it's easy to just drop incremental compilation results, but it may be a bit harder to even detect when Mill uses cached values based on incorrect code path dependencies.

lihaoyi commented 1 year ago

Doing this in userland Scala code is unlikely to work. Scala code is opaque unless lifted into some M[T] the way we do Task[T], and lifting every single method call into a Task is unfeasible. It would also defeat the purpose of letting people write build logic in plain Scala.

We'd need to go one level deeper and do it at the language/runtime level: either a Scala compiler plugin (which is what Zinc does) or a JVM bytecode analyser. Not sure which one is easier to implement, but both should be doable

lefou commented 1 year ago

Thanks for clarifying. I guess I triggered you by saying "pure Scala", while I had some macro or compiler code in mind, sorry. You are the expert here. IMHO, we should use whatever is the easiest path to success. I have no clear picture of how complex it is, but I hope I can understand it in principle once it arrives.

lihaoyi commented 1 year ago

@lefou you mentioned MillBuildModule isnt ready to use. What's the status of that? Is there a PR or something I can commandeer if I wish to push it forward?

lefou commented 1 year ago

@lefou you mentioned MillBuildModule isnt ready to use. What's the status of that? Is there a PR or something I can commandeer if I wish to push it forward?

The current version of MillBuildModule in tree does know all included files and it's dependencies. It's used in the BSP server to represent the synthetic mill-build module.

https://github.com/com-lihaoyi/mill/blob/c9220b402f22ab0d6d35ee55d0f4e3349314688c/scalalib/src/mill/scalalib/bsp/MillBuildModule.scala

What is missing is the Ammonite-specific wrapping of the .sc files.

I had some experiments in PR https://github.com/com-lihaoyi/mill/pull/1972, but never finished it up. The wrapping was rather lame and probably incorrect. It could compile a build script, but I think the packages (sub-package of ammonite) and other stuff wasn't correct. I also think that I messed something up in the latest rebase of that PR.

lihaoyi commented 1 year ago

Thanks, I think I understand.

We should be able to drop the Ammonite-specific idiosyncracies right? Since we're aiming to drop Ammonite, as long as we can compile and run the build file, it shouldn't matter if we don't e.g. follow the Ammonite package prefix convention

lefou commented 1 year ago

Thanks, I think I understand.

We should be able to drop the Ammonite-specific idiosyncracies right? Since we're aiming to drop Ammonite, as long as we can compile and run the build file, it shouldn't matter if we don't e.g. follow the Ammonite package prefix convention

I think working with default package (no package at all) has issues (you simply can't) once you try to import from another script which belongs to a package (because it is in a subdir). But choosing the same top-level package as the tool is probably not a good idea, too. The build script has not the same origin, and mechanisms like access restrictions with private[package] may not be effective.

lefou commented 1 year ago

Best is to come with some cryptic-generated package name, e.g. mill$$yolo.

lefou commented 1 year ago