Support for skipping up-to-date tasks

eatdrinksleepcode commented 9 years ago

I just found this project while searching for a Gradle-like build system for .NET. I can see Bau being the foundation for such a system. However, a major feature of Gradle is strong support for skipping up-to-date tasks, in order to make the build as fast as possible. It does this by declaring the dependencies of each task (which are usually files, but can also be other things such as script properties). I don't see any support for such a feature. Are there plans for it in the future?

adamralph commented 9 years ago

Hi @eatdrinksleepcode, yes, file tasks and dependencies is something I had in mind from early on but I've not got round to doing it yet. I'm not familiar with Gradle but I guess the mechanism is similar to Rake? I.e. you declare a file task with the name of an output file and give it pre-requisites of the names of input files and it only executes when the output file does not exist or is older than the input files?

eatdrinksleepcode commented 9 years ago

Essentially yes, although Gradle also does file content hashing compared to previous runs in addition to timestamps. The key thing that Gradle (and Rake to a lesser degree, I think) has that most .NET build systems don't have - but Bau could - is that the inputs and outputs for a task are implicit as part of the task configuration, so you get the up-to-date checking for free. For example, for a file copy task, the inputs are the files to be copied and the outputs are the destinations for those files. This information is part of the configuration of the task, so it doesn't have to be declared separately.

adamralph commented 9 years ago

Yes, I agree, it would be a very natural fit for the Bau API and I definitely want to do this.

Initially I think we should release a simple version which supports only the timestamp comparison.

The file hashing is nice in that it is actually based on content, but I'm a bit wary about the additional complexity introduced by having to store state, i.e. hashes from last run.

eatdrinksleepcode commented 9 years ago

Agreed, timestamps are a good first step; comparing contents to previous runs could come later, and there is certainly quite a bit of extra complexity to it (although while working with the MSBuild scripts in CoreFx, I have come across several scenarios where only having timestamp comparison was a limiting factor that required working around).

If you are amenable, I can start spiking a potential implementation this weekend.

adamralph commented 9 years ago

That would be great! What syntax are you thinking of? My guess would be something like:

Require<Bau>().File("out.txt").DependsOn("in1.txt", "in2.txt").(file => ...);

Where file is an instance of something like:

public class File : BauTask
{
    public string Name { get; }
    public IReadOnlyCollection<string> Prerequisites { get; }
}

The above would be a syntactical port of the Rake file task.

Prerequisites is a little long. I was thinking of Inputs instead but then Name should really be Output. I like Name, however.

eatdrinksleepcode commented 9 years ago

I am studying Rake's implementation of these concepts, as I am much more familiar with Gradle, and I want to make sure I understand both before I make a recommendation for Bau. My thoughts so far:

Rake uses the "prerequisites" attribute both for prerequisite tasks and for input files. Gradle uses distinct properties for prerequisite tasks and file inputs (and other kinds of inputs). I prefer that distinction.

Regarding names, Gradle allows any property on a Task class to be annotated with Input or Output, which will then be read by the upToDate method. This allows Task classes to use semantically meaningful names for inputs and outputs.

Rake doesn't seem to have an explicit concept of a single task that produces multiple files. This can be worked around by using either the "rule" method or normal Ruby techniques to produce multiple tasks, one for each output file.

adamralph commented 9 years ago

I guess the distinction is probably better. In that case, if I had something like:

bau.Task("foo")
    .Do(...);

bau.Task("bar")
    .DependsOn("foo)
    .Inputs("path/to/input1.txt", "path/to/input1.txt")
    .Outputs("path/to/output1.txt", "path/to/output2.txt")
    .Do(...);

Then a call to run "bar" would first check whether the inputs pre-date the outputs, skipping "bar" if they do not, and "foo" would only run if "bar" needs to be run?

The idea of annotating (I presume you mean with attributes) properties on the task is interesting, with the only downside being the attributes themselves (I like to avoid them if possible). So I guess that would lead to a syntax like:

bau.CustomTask("bar")
    .DependsOn("foo)
    .Do(ct =>
    {
        ct.MyInput = "path/to/input.txt";
        ct.MyOutput = "path/to/output.txt";
    });

and I guess if a fluent (chaining) API is provided then the values passed as arguments would have to be exposed as properties, e.g.

bau.CustomTask("bar")
    .DependsOn("foo)
    .Do(ct => ct
        .WithMyInput("path/to/input.txt")
        .WithMyOutput("path/to/output.txt"));

public class CustomTask : BauTask
{
    [Input]
    public string MyInput{ get; set; }

    [Output]
    public string MyOutput{ get; set; }

    public void WithMyInput(string fileName)
    {
        Input = fileName;
    }

    public void WithMyOutput(string fileName)
    {
        Output= fileName;
    }
    ...
}

Is this what you have in mind?

eatdrinksleepcode commented 9 years ago

Yeah I think that reflects pretty well what I am thinking of.

and "foo" would only run if "bar" needs to be run

Typically prerequisites run first, and then the task is checked to see if it is up to date. The reason for this is that prerequisites may intentionally change the inputs to the task (imagine the "run unit tests" task has a dependency on "compile unit test project"). If you need a prerequisite task to only run if the current task needs to be run, then instead of making it a prerequisite, you can explicitly invoke it as the first action of the current task.

An alternative would be to provide a mechanism for indicating whether a prerequisite should be run before or after the up-to-date check; but I feel like that is surfacing too much complexity into the API.

The idea of annotating (I presume you mean with attributes) properties on the task is interesting, with the only downside being the attributes themselves (I like to avoid them if possible).

(Yes I mean with attributes.) An alternative to the attribute approach would be to create an overridable method that allows the Task itself to explicitly add to the Inputs and Outputs collection based on its own named properties. This method would run after configuration but just prior to the up-to-date check. A downside of that is that it would not be possible to inspect the Inputs and Outputs collection during configuration, but I don't know if that would be a limitation in practice.

I am quite comfortable with the attribute approach, but I could go either way. The consumer of the task does not have to know about the attributes, only the task creator, so conceptual overhead for typical users is not affected.

adamralph commented 9 years ago

Ah, of course, you are correct regarding the pre-requisites. I agree that control of pre-requisite running before/after up to date check is unnecessary complexity.

Yes, the idea of overrideable methods occurred to me too. I was thinking of something like this.

public class CustomTask : BauTask
{
    public string MyInput{ get; set; }

    public string MyOutput{ get; set; }

    protected override IEnumerable<string> Inputs // or public, if protected doesn't work
    {
        yield this.MyInput;
    }

    protected override IEnumerable<string> Outputs
    {
        yield this.MyOutput;
    }
    ...
}

I'll continue to think about it. Right now neither is a clear winner to me.

However, in terms of terseness, this does seem to win out:

public class CustomTask : BauTask
{
    [Input]
    public string MyInput{ get; set; }

    [Output]
    public string MyOutput{ get; set; }
    ...
}

Perhaps I just need to shake off my attribute allergy :stuck_out_tongue_closed_eyes:

eatdrinksleepcode commented 9 years ago

A slight tangent...

In Gradle, a task is allowed to specify inputs in multiple ways: a call to "inputs.files" could pass a raw string (representing a file path), a collection of strings, a File object, a collection of File objects, another Task (which would be interpreted as using the Task's outputs), etc. The API for handling these diverse possibilities is trivial: inputs are specified as Objects, and Gradle figures out what to do based on what type of Object it is. If the Object you add can't be interpreted as a file dependency, a runtime exception is thrown. Annotations are handled the same way. This untyped API approach is one of the things that I find the most frustrating about Gradle; as a fan of static typing, I hate not knowing what I am supposed to pass to an API without going to the documentation or waiting for a runtime exception. Given that Bau is built on a statically-typed language and seems to be building a statically-typed API, this dynamic approach seems inappropriate.

However, it does have some advantages. APIs can take a wide variety of inputs without resorting to excessive overloads (see Project.files(...) for an example of a seemingly simple API that would require a lot of overloads to express its functionality with static typing). Also, an untyped API can easily take a Callable (equivalent to an untyped Delegate) instead of a raw value. This is a subtle point, but I have found it to be quite useful. Using a Callable (which is not executed until the moment the value is needed during the execution phase) allows decisions about what value to be used to be deferred until the last moment. While this has trade-offs (Gradle's strong conventions cannot infer as much when the value is not known at configuration time), it allows a level of flexibility which can be extremely powerful. However, using delegates in C# typed APIs would essentially double the number of overloads required (string, Func, FileInfo, Func, etc).

I don't have a solution for this yet. Just something I can considering.

adamralph commented 9 years ago

It is an interesting aspect to consider. It's also leaning more towards the gulp model, i.e. a task produces things for another task and the task runner itself carries the responsibility of transferring those things from one task to the other. In gulp those things just happen to be streams. This is something I was thinking about for Bau at one stage but I never got as far as projecting these thoughts into code, or even a proposed API. I do wonder if it takes the product into a completely different direction though, or whether this model can somehow live side by side or be integrated into the current model.

eatdrinksleepcode commented 9 years ago

I have definitely had the need to pass data between tasks. I have always worked around it by using the project properties of the build system (essentially global static data), or using temporary files. I am not familiar with gulp, but I can definitely see the appeal of the stream concept. Whether or not the two models can live together, I am not sure. I would tend not to go that far until there was a proven need for it.

aarondandy commented 9 years ago

As I have been watching this thread and thinking about things like async and passing things between tasks I wonder if there should be a complete redesign of what a BauTask is, how they work, and how we can interact with them. That definitely goes outside the scope of this issue and would be a massive change but I think this should be looked at sooner rather than later. I would love to see the two of you chat about this on skype, google, JabbR or whatever and to have us come up with a set of requirements that could be looked at in a new light and as a new direction for BauTask.

eatdrinksleepcode commented 9 years ago

Regarding strongly-typed APIs, I think I would be inclined to use explicit overloads, even if we end up with a lot of them. The value of static typing for efficiently communicating to the developer is too big to throw away.

eatdrinksleepcode commented 9 years ago

@aarondandy I am open to having the conversation, although I am not certain what the goal of such a conversation would be. FWIW the current definition of BauTask is pretty similar to both Gradle and Rake, which I think are reasonable places to take inspiration. There is plenty of room for improvement, but it sounds like you are looking for a more radical change. Do you want to open an issue to start a conversation on what that change might look like?

aarondandy commented 9 years ago

I may make some issues later or revive some I find.

adamralph / bau

Support for skipping up-to-date tasks #196