Closed felixfontein closed 7 years ago
(I now squashed the commits.)
Would it be ok to make the assumption that that plugin.name is the only task name callable via
nikola build <taskname>
?
nikola build foo
generates and executes (if necessary) foo
tasks and all their dependencies. In other words, nikola build sitemap
≈ nikola build
in standard sites.
You can also call any task basename from nikola build
. Including those which do not have the same name as their providing plugin.
I adhere to schettino's question:
"I am not 100% convinced about this usage of "stages"... why not just directly depend on task/plugin name?"
We could make all plugins emit one task with the plugin name depending on all its previous tasks, if that helps.
A second thing: what happens if you use doit run bla, where bla is a target of a task which is generated on the fly?
You are right specifying the target
of a DelayedTask wont work.
This is a hard to solve problem. I see a few options but I am not sure they are worth the trouble:
task
that generated the target
, this would require loading and searching the whole DB because as of now the DB have efficient lookup only by task name.@ralsina: but what if you want to build only a task which is not equal to the plugin's name? And not everything belonging to that plugin? That won't be possible with approach.
@schettino72: Actually, I like the first idea a lot. I mean, it won't be 'worse' than it is now (concerning the order of execution and task generation).
We could also restrict the syntax of nikola build <...>
to only allow to specify plugin names or something of the kind plugin_name:target
.
@schettino72: Actually, I like the first idea a lot. I mean, it won't be 'worse' than it is now (concerning the order of execution and task generation).
@felixfontein created https://github.com/pydoit/doit/issues/20 with some further thoughts about it. Hopefully you can assign yourself to implement it :)
I'll try :) Though not today anymore...
Ok, I now rewrote parts of the code to have each task plugin's tasks generated by one delayed task loader. Also, the delayed task's name equals the task plugin's name, whence nikola build <task_name>
works again.
One thing I noticed: since all tasks of stage 2 (f.e.) depend on the waiting task of stage 1, and that waiting task depends on all tasks of stage 1, building one specific task in stage 2 via nikola build <task_name>
triggers a build of all tasks of stage 1. This could be helped by adding a modified version of task_dep
to doit, which is only used to determine the order of execution, but not which tasks have to also be built before a specified task can be build. @schettino72: what do you think about this?
This could be helped by adding a modified version of task_dep to doit, which is only used to determine the order of execution, but not which tasks have to also be built before a specified task can be build. @schettino72: what do you think about this?
Do you mean a setup-task?
uhmm. The docs need an example without a teardown
.
Maybe a delayed task should create an implicit setup-task instead of a task_dep... It is a trivial change, can you try it?
No, a setup-task will be executed when this task is executed. A wait-for
dependency should not be executed (except of course it is manually specified on the command line, or it also appears as a proper dependency of another task to be executed), it should only participate in determining the execution order resp. when to start executing a task.
@felixfontein give me an example please. dodo.py format and what happens when you run it. better create an issue on doit tracker or we gonna hijack this issue (again).
Take the following dodo.py
file:
def task_a_start():
return {
'basename': 'a_start',
'actions': None,
}
def task_a1():
return {
'basename': 'a1',
'task_dep': ['a_start'],
'actions': ['echo A1'],
}
def task_a2():
return {
'basename': 'a2',
'task_dep': ['a_start'],
'actions': ['echo A2'],
}
def task_a_wait():
return {
'basename': 'a_wait',
'task_dep': ['a1', 'a2'],
'actions': None,
}
def task_b_start():
return {
'basename': 'b_start',
'task_dep': ['a_wait'],
'actions': None,
}
def task_b1():
return {
'basename': 'b1',
'task_dep': ['b_start'],
'actions': ['echo B1'],
}
def task_b2():
return {
'basename': 'b2',
'task_dep': ['b_start', 'a2'],
'actions': ['echo B2'],
}
def task_b_wait():
return {
'basename': 'b_wait',
'task_dep': ['b1', 'b2'],
'actions': None,
}
There are two stages, a
and b
. To ensure that b
is executed when a
is done, a_wait
depends on all a
tasks, all b
tasks depend on b_start
, and b_start
depends on a_wait
. There's also a dependence between b2
and a2
.
I would like this last dependence (of b_start
on a_wait
) to be a wait-for
dependence, so that if I run doit b1
, only a1
(and a_start
) are executed. (And if I run doit b2
, only a2
and b2
and the corresponding _start
tasks are executed.)
I think for first discussing on how to do this (because it has to do a lot with this feature) it's ok to discuss it here, but as soon as we know what we want we can continue to discuss it in the doit tracker. Hope that's ok for you :)
@felixfontein thanks for the example. I guess I understand your problem
In my opinion this problem only arises when using "phases" that doit has really no support for, so maybe a patch on Nikola is more appropriate.
Can you define better what triggers the change of behaviour in these wait-for
dependency? Is it when any task is specified in the command line? Sounds too tricky to me...
And how can you test/trigger this before https://github.com/pydoit/doit/issues/20 being implemented?
Anyway I gave it a try here:
https://github.com/schettino72/nikola/compare/getnikola:earlytask_impl...earlytask?expand=1
Luckily I added pos_args
in the signature of load_tasks
even that I didnt know any use for it up to now :)
Hmm, a wait-for
instead of task_dep
could also be of interest if you want to process tasks in parallel, but some tasks need a resource which cannot be used in parallel (maybe some external device, like a DVD writer). For such a setup, you need a mechanism to make a second task to be not executed until a first task is done, but you don't want an explicit dependency so you can build each one individually.
Yes, I know that this sounds a bit far fetched, but at least it shows such a feature could in theory be used in a more general setting.
Anyway, there's no behavior difference for wait-for
for special situations; it should always behave the same way: if two tasks a
and b
are scheduled to be executed, and b
wait-for
a
, then a
is not executed before b
is done. So if b
is specified as a task to be executed (either via command line or as a default task), this does not trigger a check of a
's dependencies (to determine whether it should be executed) like a task_dep
does. It only ensures that if a
is actually executed, b
will only be executed when a
is done.
(Your try is a hack which works fine if all tasks specified on the command line are within one stage, but if they are not, tasks from a later stage might be generated before an earlier stage finished execution.)
Ok, I got an idea where this could be quite useful. Assume that you want to record audio samples, maybe for a study. Every sample (recorded as a .wav file) should be converted to different formats (say .ogg and .mp4) afterwards. So you create a recording task for every sample to record, and tasks to create .ogg and .mp4 files (which depend on the recording task). Since the encoding can be done in parallel, you want to run doit with -n2
. But you cannot record two things at the same time, so you need to introduce dependencies between the sample recordings.
If you have three recordings, a
, b
and c
, you could use task_dep
to get a chain a -> b -> c
. But now, if you only want to do recording b
(for example because you noticed the recording has too much background noise), you want to run doit run b -n2
. But since there's a task dependence, doit will by default also execute task a
. So you end up doing two recordings, even though you needed only one.
Here you would prefer to use a wait-for
dependency between a
, b
and c
, and not a task_dep
.
(Even if you don't want to do the encoding part and thus don't need parallel execution, having such a wait-for
dependency makes sense to protect against failures when once executing the tasks in parallel -- which might happen if your dodo.py is included in a larger dodo.py which does a lot more.)
@felixfontein the audio sample example is a different case... this is an example where you need contention based on resource utilization for parallel scheduling. This has been raised before, it is feasible to be implemented in doit. But using wait-for
would be a poor solution because it can handle just 1 resource being shared.
if two tasks a and b are scheduled to be executed, and b wait-for a, then a is not executed before b is done.
I guess you need to understand a bit more how doit works internally. A task is schedule to be executed in 2 situations:
1) the task is specified in the command line or default_tasks 2) the task is a dependency of a previously scheduled task
The problem is that in 2) this happens at run time while tasks are being executed. In other words, doit does not pre-compute the whole task dependency tree before it starts its execution. This has some advantages: being fast (dont compute parts of the tree that are not used), and allowing some dynamic modification of the "tree" (like calc_dep and delayed task creation).
To implement A wait-for
B doit would have to finish all other scheduled tasks to make sure that none of the scheduled tasks would have a real task_dep to B. But since you might have multiple uses of wait-for
even that would not guarantee that no further scheduled task would not have a task_dep on B. Thats kind of the same problem as you pointed in my hacky patch.
The other option would be to pre-compute the whole DAG, but again given the very dynamic nature of doit you would still have no guarantee that a "skipped" wait-for
task would not be scheduled later by a third task.
So I guess this wait-for
could only be implemented if there was no support for dynamic changes in the task dependency-tree. Or do you have an idea on how an implementation would work?
I guess it does not solve your problem but doit has a "--single" flag for the run
command (or build
in Nikola) that ignores task_dep. Sometimes useful to avoid rebuilding a lot of stuff when trying some changes.
Having --single
is probably enough for most use-cases in Nikola (the need to avoid earlier stages to check for tasks to make is somewhat special anyway, I think). So maybe we can just ignore this thing. If I get a good idea how this could be solved/implemented while working on #20 in the doit tracker I'll try that out; if not let's leave it as it is. (Or does anyone objects to this?)
That’s not going to happen any time soon.
Hi,
I'd like to have a plugin category
EarlyTask
, for tasks which are executed before the site is rendered (i.e. an analogue toLateTask
). I personally need that for a plugin (or better, combination of plugins) I wrote, currently I used theTask
plugin class but it happens that some tasks are run after page compiling, while my page compiling plugin needs their result -- and so it fails.Does anyone mind if I add something like that? Or would it be better to have a general priority system, so you can assign a task a priority (usual tasks could get 10, and late tasks 100, so you could add a task with priority 2 and one with priority 7 to ensure that the one with priority 2 appears in the task list before the one with priority 7 and before all regular rendering tasks and late tasks)?
(I have the vague feeling that I already read something about
EarlyTask
s somewhere here, but I cannot remember where. So it's probably not my own idea :) )Cheers, Felix