Watch mode on multiple commands should only run commands downstream of changed files (1000USD Bounty)

nafg commented 2 months ago

From the maintainer Li Haoyi: I'm putting a 1000USD bounty on this issue, payable by bank transfer on a merged PR implementing this.

The goal of this is to make -w/--watch only run tasks/commands that are downstream of changed Task.Source files or Task.Input, so that if you watch multiple commands (e.g. webserver.runBackground and webclient.deploy), only the tasks relevant to a change get re-run.

Background

I think my use case is a pretty common need. I'm developing in full-stack Scala: some JVM modules, some ScalaJS modules, and some shared dependencies of both. I want a fast feedback loop. When code in a JS or shared module changes, it needs to rebundle it, and when code in a JVM or shared module changes, it needs to recompile it and restart the server.

I don't need hot reload of the javascript code (that would be even better but let's get to first base first), but I do need to reload the page after rebundling JS or rebooting the server completes, but I'm accomplishing that outside of the build tool so let's ignore that aspect of things.

In my case, the target to bundle the JS and put it in the right place is app_js.getScalaJs, and restarting the backend is done with app_lrbcol.runBackground.

I've been doing this with two separate Mill processes. tmuxp is useful for this; my config contained

  panes:
  - mill -j 16 -w app_lrbcol.runBackground
  - docker compose up db
  - sleep 5; mill -j 16 -w app_js.getScalaJs

However, at the moment Mill isn't designed for multiple processes like this (see e.g., #3454). #3519 may fix this, but I recently was reminded that Mill supports multiple parallel targets.

The issue

So I tried this command instead:

mill -j 0 -w app_lrbcol.runBackground + app_js.getScalaJs

However this doesn't do what I want. If I make a change to ScalaJS code, even if it doesn't compile, it still causes runBackground to run.

It would be better IMO if somehow watch mode could apply to each target independently, instead of what it seems to be doing, namely aggregating a single watch list of files and any of them cause the compound target to run.

arturaz commented 3 weeks ago

I'll grab this, as this is a feature I directly need.

lihaoyi commented 3 weeks ago

@arturaz go for it. I just updated the PR description with some other ideas in how to implement this

lihaoyi commented 3 weeks ago

Possible Approaches:

Start with runner/src/mill/runner/Watching.scala.
- We need to make watchLoop's evaluate callback able to take an optional list of changed tasks IDs
- watchAndWait and statWatchWait would need to return the task IDs that changed so we can pass it to evaluate. We need need to make sure we include any tasks whose methodCodeHashSignatures(methodName) changed, indicating a code change rather than an input file change
- The evaluate callback in MillMain.scala currently runs new MillBuildBootstrap(...).evaluate(); we need to make it take the list of changed task IDs and only evaluate tasks which are downstream of the changed tasks.
- This ability to filter a -w/--watch evaluation based on selected files or inputs will also be valuable in other contexts, e.g. selectively running tests based on git diff in CI.
Another way to do this would be to add an optional configuration somewhere to make evaluteGroupCache handle Commands similarly to Targets, i.e. to check the upstream inputs cache to see if they changed before running them.
- That would probably need to change here to allow working with commands despite readWriterOpt being None, e.g. by just returning null
- This flag can be set automatically on non-first runs of -w/--watch to satisfy this ticket, in future could be used elsewhere as desired, and would be less invasive than the original approach proposed above
Add an optional configuration to make EvaluatorCore skip parts of its task graph that are not downstream of union(changed_input_tasks + tasks_whose_code_sig_changed).
- EvaluatorCore already has the task graph, sorts it in topological order, and knows where the inputs are and which tasks had their code signature changed.
- We would (a) up-front scan all inputs for file changes and tasks for code signature changes, then (b) do a BFS downstream from those to find all tasks that could be affected, (c) do a BFS upstream from the affected tasks to find all tasks required to be run, and then (d) only bother evaluating those required tasks in topological order

lihaoyi commented 3 weeks ago

@arturaz I think approach (3) above is the most promising (least edge cases) and easiest to implement (most localized change), so I would suggest try that first

arturaz commented 3 weeks ago

Could you elaborate more on the 3rd approach?

EvaluatorCore already has the task graph

Where is that located? Are you referring to goals: Agg[Task[_]]?

knows where the inputs are and which tasks had their code signature changed.

How does it know that?

lihaoyi commented 3 weeks ago

Could you elaborate more on the 3rd approach?

EvaluatorCore already has the task graph

Where is that located? Are you referring to goals: Agg[Task[_]]?

Yes, Task[_] has an def inputs: Seq[Task[_]], and that forms a direct acyclic graph. You can see the existing code doing graph operations here https://github.com/com-lihaoyi/mill/blob/c1f4793187a77df8bb520c6b50288e3fa137dd1b/main/eval/src/mill/eval/EvaluatorCore.scala#L74-L76

The thing you probably care about most is terminals0/terminals. These are the named tasks, which may contain an Input, a Target, a Command, etc.. Each named task is part of a "Group" that contains one named task and zero or more Task.Anon anonymous tasks.

knows where the inputs are and which tasks had their code signature changed.

How does it know that?

Currently, EvaluatorCore#evaluate0 splits up the terminals into two groups: exclusive commands (that run serially at the end) and everything else (that runs in parallel earlier), each of which is separately passed to evaluateTerminals. What you can do here is to split it up into three groups using the following steps:

union(changed_input_tasks + tasks_whose_code_sig_changed), that runs first to find the tasks which are have potentially changed
1. changed_input_tasks can be found by finding all input tasks in the terminals0 list, finding their upstream tasks, and passing them to evaluateTerminals to get their results. We will likely need to make loadCachedJson return the previous cached.valueHash in addition to the cached.inputsHash even in the case of a cache miss (which is always the case for input tasks), change its return type from Option[(Int, Option[(Val, Int)])] to Option[(Int, Int, Option[Val])]
2. tasks_whose_code_sig_changed can be found by taking all the tasks and joining it with methodCodeHashSignatures. You can see how this is done in GroupEvaluator, we can move that logic into a helper function and re-use it in EvaluatorCore
Then we do a downstream breadth-first-search to find all the tasks downstream of those changed tasks
1. you might need to generate your own look-aside table of downstream edges, since the default Task#inputs edges point upstream
Now that we have the set of tasks downstream of the changed of the union(changed_input_tasks + tasks_whose_code_sig_changed), you can use that to filter terminals0 before it gets partitioned in (tasks, leafExclusiveCommands) = terminals0.partition
We now pass the filtered tasks and leafExclusiveCommands to evaluateTerminals one after the other, like we do now.

Let me know if you have any other questions or things you are unsure about and I can help answer. The internals of the Evaluator are definitely a bit gnarly, but the steps I gave above should work, and I'm happy to help you get them working

arturaz commented 3 weeks ago

changed_input_tasks can be found by finding all input tasks in the terminals0 list

Do by "finding all input tasks" you mean terminals0.map(_.task.inputs) or something like terminals0.filter(_.task.asInput) (whereas asInput currently does not exist)?

finding their upstream tasks

The upstream task is a task that is in the task.inputs? Do we have to track back it to the task roots (tasks which have task.inputs empty)?

and passing them to evaluateTerminals to get their results.

Finding upstream tasks seems that it would give me a Seq[Task[_]], how should one turn that into Seq[Terminal]?

tasks_whose_code_sig_changed can be found by taking all the tasks and joining it with methodCodeHashSignatures. You can see how this is done in GroupEvaluator, we can move that logic into a helper function and re-use it in EvaluatorCore

Are you talking about this code block?

https://github.com/com-lihaoyi/mill/blob/c1f4793187a77df8bb520c6b50288e3fa137dd1b/main/eval/src/mill/eval/GroupEvaluator.scala#L80-L125

lihaoyi commented 3 weeks ago

changed_input_tasks can be found by finding all input tasks in the terminals0 list

Do by "finding all input tasks" you mean terminals0.map(_.task.inputs) or something like terminals0.filter(_.task.asInput) (whereas asInput currently does not exist)?

I mean terminals0.filter(_.task.asInput), or whatever equivalent you come up with

finding their upstream tasks

The upstream task is a task that is in the task.inputs? Do we have to track back it to the task roots (tasks which have task.inputs empty)?

Task.Inputs in Mill are not necessarily roots, but can in fact have other upstream tasks. This is a bit unusual and not commonly used, but it is the case for now. Thus, in order to evaluate the Inputs, you have to run the whole top-sorted-evaluation workflow on their transitive upstream graph, so you do have to track it back to the task roots. I think Plan does some of that, or you can write your own breadth/depth-first-search

and passing them to evaluateTerminals to get their results.

Finding upstream tasks seems that it would give me a Seq[Task[_]], how should one turn that into Seq[Terminal]?

Take the terminal0: Seq[Terminal] you already have terminal0.filter(term => upstreamTasks.contains(term.task)) it should do the job I think

tasks_whose_code_sig_changed can be found by taking all the tasks and joining it with methodCodeHashSignatures. You can see how this is done in GroupEvaluator, we can move that logic into a helper function and re-use it in EvaluatorCore

Are you talking about this code block?

https://github.com/com-lihaoyi/mill/blob/c1f4793187a77df8bb520c6b50288e3fa137dd1b/main/eval/src/mill/eval/GroupEvaluator.scala#L80-L125

Yes that's the one!

arturaz commented 3 weeks ago

Am I right to assume .asInput should be private[mill] def asInput: Option[InputImpl[T]] ?

There is this, but it's deprecated:

  @deprecated("Use mill.define.Target instead.", "Mill after 0.11.0-M8")
  type Input[T] = Target[T]

lihaoyi commented 3 weeks ago

Thats right, you need to check if it an InputImpl

arturaz commented 3 weeks ago

tasks_whose_code_sig_changed can be found by taking all the tasks and joining it with methodCodeHashSignatures.

Can you elaborate on what do you mean by "joining it"?

And by "all the tasks" do you mean terminals0, goals or something else?

I can do something like this:

    terminals0
      .flatMap(_.asLabeled)
      .map(t => t -> calculateNamedTaskHashes(t.task, classToTransitiveClasses, allTransitiveClassMethods).sum)

But then what should I do with the result?

lihaoyi commented 2 weeks ago

Can you elaborate on what do you mean by "joining it"?

Looking up the signature of each task's method in that Map and comparing it to the signature before. We may need to store the signature before in Evaluator.Cached so it is accessible next run

And by "all the tasks" do you mean terminals0, goals or something else?

terminals0

arturaz commented 2 weeks ago

I discovered a suspicious place.

Here, it takes either the inputsHash or ._2, which is (Val, Int) https://github.com/com-lihaoyi/mill/blob/c1f4793187a77df8bb520c6b50288e3fa137dd1b/main/eval/src/mill/eval/GroupEvaluator.scala#L171-L175

Where the Int is a valueHash: https://github.com/com-lihaoyi/mill/blob/c1f4793187a77df8bb520c6b50288e3fa137dd1b/main/eval/src/mill/eval/GroupEvaluator.scala#L450

Is that intended?

lihaoyi commented 2 weeks ago

@arturaz I'm not sure, what's your issue with it? What do you find suspicious about those snippets?

arturaz commented 2 weeks ago

That one branch is using the inputs hash, whilst the other branch is using the value hash. These two seem unrelated to each other semantically?

lihaoyi commented 2 weeks ago

@arturaz good question haha, honestly I'm not sure. Does the blame show anything? Otherwise I suspect just looking things up using the input hash should be enough

lihaoyi commented 2 weeks ago

So it looks like we're taking the hash code in that snippet and returning it in GroupResults as the TaskResult hash, representing the hash of the output of this task and used for invalidating downstream tasks

I think what's happening is that workers and non-worker cached tasks are cached in different ways:

Cached tasks go through the cached = loadCachedJson code path and their value hash is the hash of the output value.
Workers go through the loadUpToDateWorker code path and their value hash is the hash of their inputs (because worker values cannot generally be relied upon to be serializable or have good hashcodes)

In theory we should be able to do the worker code path for non-workers as well, with one caveat: the worker logic is probably slightly incorrect, since we also should be including the methodCodeHashSignatures of the worker method to return as its value hash. This would be a conservative over-estimation since it's possible for a Task's inputs to change and/or its code to change without affecting its output value, which is a case the current code path for normal Tasks is able to handle that the code paths for workers cannot. But because workers are not generally serializable/hashable, we have to go with a conservative over-approximation

com-lihaoyi / mill

Watch mode on multiple commands should only run commands downstream of changed files (1000USD Bounty) #3534

Background

The issue