```EarlyTask``` plugins

felixfontein commented 9 years ago

Hi,

I'd like to have a plugin category EarlyTask, for tasks which are executed before the site is rendered (i.e. an analogue to LateTask). I personally need that for a plugin (or better, combination of plugins) I wrote, currently I used the Task plugin class but it happens that some tasks are run after page compiling, while my page compiling plugin needs their result -- and so it fails.

Does anyone mind if I add something like that? Or would it be better to have a general priority system, so you can assign a task a priority (usual tasks could get 10, and late tasks 100, so you could add a task with priority 2 and one with priority 7 to ensure that the one with priority 2 appears in the task list before the one with priority 7 and before all regular rendering tasks and late tasks)?

(I have the vague feeling that I already read something about EarlyTasks somewhere here, but I cannot remember where. So it's probably not my own idea :) )

Cheers, Felix

felixfontein commented 9 years ago

(I now squashed the commits.)

Kwpolska commented 9 years ago

Would it be ok to make the assumption that that plugin.name is the only task name callable via nikola build <taskname>?

nikola build foo generates and executes (if necessary) foo tasks and all their dependencies. In other words, nikola build sitemap ≈ nikola build in standard sites.

You can also call any task basename from nikola build. Including those which do not have the same name as their providing plugin.

ralsina commented 9 years ago

I adhere to schettino's question:

"I am not 100% convinced about this usage of "stages"... why not just directly depend on task/plugin name?"

We could make all plugins emit one task with the plugin name depending on all its previous tasks, if that helps.

schettino72 commented 9 years ago

A second thing: what happens if you use doit run bla, where bla is a target of a task which is generated on the fly?

You are right specifying the target of a DelayedTask wont work. This is a hard to solve problem. I see a few options but I am not sure they are worth the trouble:

when the specified argument is not a known task or target, doit would load delayed tasks (as done by the list command) to try to find the target. This would not be 100% reliable and might bring other problems...
query the DB looking for the task that generated the target, this would require loading and searching the whole DB because as of now the DB have efficient lookup only by task name.
have some kind of rule/regex that would be able to find a task for given target path.

felixfontein commented 9 years ago

@ralsina: but what if you want to build only a task which is not equal to the plugin's name? And not everything belonging to that plugin? That won't be possible with approach.

felixfontein commented 9 years ago

@schettino72: Actually, I like the first idea a lot. I mean, it won't be 'worse' than it is now (concerning the order of execution and task generation).

We could also restrict the syntax of nikola build <...> to only allow to specify plugin names or something of the kind plugin_name:target.

schettino72 commented 9 years ago

@schettino72: Actually, I like the first idea a lot. I mean, it won't be 'worse' than it is now (concerning the order of execution and task generation).

@felixfontein created https://github.com/pydoit/doit/issues/20 with some further thoughts about it. Hopefully you can assign yourself to implement it :)

felixfontein commented 9 years ago

I'll try :) Though not today anymore...

felixfontein commented 9 years ago

Ok, I now rewrote parts of the code to have each task plugin's tasks generated by one delayed task loader. Also, the delayed task's name equals the task plugin's name, whence nikola build <task_name> works again.

One thing I noticed: since all tasks of stage 2 (f.e.) depend on the waiting task of stage 1, and that waiting task depends on all tasks of stage 1, building one specific task in stage 2 via nikola build <task_name> triggers a build of all tasks of stage 1. This could be helped by adding a modified version of task_dep to doit, which is only used to determine the order of execution, but not which tasks have to also be built before a specified task can be build. @schettino72: what do you think about this?

schettino72 commented 9 years ago

This could be helped by adding a modified version of task_dep to doit, which is only used to determine the order of execution, but not which tasks have to also be built before a specified task can be build. @schettino72: what do you think about this?

Do you mean a setup-task? uhmm. The docs need an example without a teardown.

Maybe a delayed task should create an implicit setup-task instead of a task_dep... It is a trivial change, can you try it?

felixfontein commented 9 years ago

No, a setup-task will be executed when this task is executed. A wait-for dependency should not be executed (except of course it is manually specified on the command line, or it also appears as a proper dependency of another task to be executed), it should only participate in determining the execution order resp. when to start executing a task.

schettino72 commented 9 years ago

@felixfontein give me an example please. dodo.py format and what happens when you run it. better create an issue on doit tracker or we gonna hijack this issue (again).

felixfontein commented 9 years ago

Take the following dodo.py file:

def task_a_start():
    return {
        'basename': 'a_start',
        'actions': None,
    }

def task_a1():
    return {
        'basename': 'a1',
        'task_dep': ['a_start'],
        'actions': ['echo A1'],
    }

def task_a2():
    return {
        'basename': 'a2',
        'task_dep': ['a_start'],
        'actions': ['echo A2'],
    }

def task_a_wait():
    return {
        'basename': 'a_wait',
        'task_dep': ['a1', 'a2'],
        'actions': None,
    }

def task_b_start():
    return {
        'basename': 'b_start',
        'task_dep': ['a_wait'],
        'actions': None,
    }

def task_b1():
    return {
        'basename': 'b1',
        'task_dep': ['b_start'],
        'actions': ['echo B1'],
    }

def task_b2():
    return {
        'basename': 'b2',
        'task_dep': ['b_start', 'a2'],
        'actions': ['echo B2'],
    }

def task_b_wait():
    return {
        'basename': 'b_wait',
        'task_dep': ['b1', 'b2'],
        'actions': None,
    }

There are two stages, a and b. To ensure that b is executed when a is done, a_wait depends on all a tasks, all b tasks depend on b_start, and b_start depends on a_wait. There's also a dependence between b2 and a2.

I would like this last dependence (of b_start on a_wait) to be a wait-for dependence, so that if I run doit b1, only a1 (and a_start) are executed. (And if I run doit b2, only a2 and b2 and the corresponding _start tasks are executed.)

felixfontein commented 9 years ago

I think for first discussing on how to do this (because it has to do a lot with this feature) it's ok to discuss it here, but as soon as we know what we want we can continue to discuss it in the doit tracker. Hope that's ok for you :)

schettino72 commented 9 years ago

@felixfontein thanks for the example. I guess I understand your problem

In my opinion this problem only arises when using "phases" that doit has really no support for, so maybe a patch on Nikola is more appropriate.

Can you define better what triggers the change of behaviour in these wait-for dependency? Is it when any task is specified in the command line? Sounds too tricky to me...

And how can you test/trigger this before https://github.com/pydoit/doit/issues/20 being implemented?

Anyway I gave it a try here: https://github.com/schettino72/nikola/compare/getnikola:earlytask_impl...earlytask?expand=1 Luckily I added pos_args in the signature of load_tasks even that I didnt know any use for it up to now :)

felixfontein commented 9 years ago

Hmm, a wait-for instead of task_dep could also be of interest if you want to process tasks in parallel, but some tasks need a resource which cannot be used in parallel (maybe some external device, like a DVD writer). For such a setup, you need a mechanism to make a second task to be not executed until a first task is done, but you don't want an explicit dependency so you can build each one individually.

Yes, I know that this sounds a bit far fetched, but at least it shows such a feature could in theory be used in a more general setting.

Anyway, there's no behavior difference for wait-for for special situations; it should always behave the same way: if two tasks a and b are scheduled to be executed, and b wait-for a, then a is not executed before b is done. So if b is specified as a task to be executed (either via command line or as a default task), this does not trigger a check of a's dependencies (to determine whether it should be executed) like a task_dep does. It only ensures that if a is actually executed, b will only be executed when a is done.

felixfontein commented 9 years ago

(Your try is a hack which works fine if all tasks specified on the command line are within one stage, but if they are not, tasks from a later stage might be generated before an earlier stage finished execution.)

felixfontein commented 9 years ago

Ok, I got an idea where this could be quite useful. Assume that you want to record audio samples, maybe for a study. Every sample (recorded as a .wav file) should be converted to different formats (say .ogg and .mp4) afterwards. So you create a recording task for every sample to record, and tasks to create .ogg and .mp4 files (which depend on the recording task). Since the encoding can be done in parallel, you want to run doit with -n2. But you cannot record two things at the same time, so you need to introduce dependencies between the sample recordings.

If you have three recordings, a, b and c, you could use task_dep to get a chain a -> b -> c. But now, if you only want to do recording b (for example because you noticed the recording has too much background noise), you want to run doit run b -n2. But since there's a task dependence, doit will by default also execute task a. So you end up doing two recordings, even though you needed only one.

Here you would prefer to use a wait-for dependency between a, b and c, and not a task_dep.

(Even if you don't want to do the encoding part and thus don't need parallel execution, having such a wait-for dependency makes sense to protect against failures when once executing the tasks in parallel -- which might happen if your dodo.py is included in a larger dodo.py which does a lot more.)

schettino72 commented 9 years ago

@felixfontein the audio sample example is a different case... this is an example where you need contention based on resource utilization for parallel scheduling. This has been raised before, it is feasible to be implemented in doit. But using wait-for would be a poor solution because it can handle just 1 resource being shared.

if two tasks a and b are scheduled to be executed, and b wait-for a, then a is not executed before b is done.

I guess you need to understand a bit more how doit works internally. A task is schedule to be executed in 2 situations:

1) the task is specified in the command line or default_tasks 2) the task is a dependency of a previously scheduled task

The problem is that in 2) this happens at run time while tasks are being executed. In other words, doit does not pre-compute the whole task dependency tree before it starts its execution. This has some advantages: being fast (dont compute parts of the tree that are not used), and allowing some dynamic modification of the "tree" (like calc_dep and delayed task creation).

To implement A wait-for B doit would have to finish all other scheduled tasks to make sure that none of the scheduled tasks would have a real task_dep to B. But since you might have multiple uses of wait-for even that would not guarantee that no further scheduled task would not have a task_dep on B. Thats kind of the same problem as you pointed in my hacky patch.

The other option would be to pre-compute the whole DAG, but again given the very dynamic nature of doit you would still have no guarantee that a "skipped" wait-for task would not be scheduled later by a third task.

So I guess this wait-for could only be implemented if there was no support for dynamic changes in the task dependency-tree. Or do you have an idea on how an implementation would work?

I guess it does not solve your problem but doit has a "--single" flag for the run command (or build in Nikola) that ignores task_dep. Sometimes useful to avoid rebuilding a lot of stuff when trying some changes.

felixfontein commented 9 years ago

Having --single is probably enough for most use-cases in Nikola (the need to avoid earlier stages to check for tasks to make is somewhat special anyway, I think). So maybe we can just ignore this thing. If I get a good idea how this could be solved/implemented while working on #20 in the doit tracker I'll try that out; if not let's leave it as it is. (Or does anyone objects to this?)

Kwpolska commented 7 years ago

That’s not going to happen any time soon.

getnikola / nikola

```EarlyTask``` plugins #1562