Closed felixfontein closed 7 years ago
Yes, that was it. Thanks for linking it!
So the main question is probably: just add another category, or add a simple priority system so that there's only one category (Task
, with LateTask
being a special case of Task
with higher priority)?
(Ah, I forgot the fine prints: the EarlyTask
plugins should run before any posts are scanned. For my case, that doesn't matter, but maybe someone wants something to be run after posts are scanned but before posts are rendered. A general priority system could run everything with negative priority numbers before posts are scanned, and everything else afterwards, and rendering the site could have priority 10 or so so that it is possible to squeeze something between post scanning and site rendering.)
Creating a category is a total of five lines of code across two files (class EarlyTask(Task): pass
in plugin_categories.py
; a modified import and plugin load directive in nikola.py
).
A priority system would be a huge mess, would require a lot of changes everywhere and giving out actual priorities to each and every task to make it flexible.
I’m pretty sure scan_posts
is called by the first Task
that gets loaded and wants it.
And I just created EarlyTask
.
Shouldn't EarlyTask
s be added to the default tasks loaded by the NikolaTaskLoader
?
@punchagan Probably. I'll do it tomorrow.
Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:08 PM, "Puneeth Chaganti" notifications@github.com wrote:
Shouldn't EarlyTasks be added to the default tasks https://github.com/getnikola/nikola/blob/master/nikola/__main__.py#L243 loaded by the NikolaTaskLoader?
— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68469306.
@Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it. (Tomorrow).
Happy New Year! :fireworks:
Where is it?
Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:38 PM, "Puneeth Chaganti" notifications@github.com wrote:
@Kwpolska https://github.com/Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it.
— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68470658.
The loader should have been fixed. I managed to do it in 2014 and on my phone.
Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:39 PM, "Chris Warrick" kwpolska@gmail.com wrote:
Where is it?
Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:38 PM, "Puneeth Chaganti" notifications@github.com wrote:
@Kwpolska https://github.com/Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it.
— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68470658.
Woops, didn't see your question. I had referenced the PR in this commit, but no email notifications for them. :)
Thanks!
You should've posted the commit sha here, that way I would notice. Either way, it's now solved (and our solutions are the same in code with just a small difference when it comes to English)
Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 11:08 PM, "Puneeth Chaganti" notifications@github.com wrote:
Woops, didn't see your question. I had referenced the PR in this commit, but no email notifications for them. :)
Thanks!
— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68471926.
Yep. Thanks!
Cool, everything's already done! :)
A priority system won't be exactly complicated, since most things can have the same priority -- there are still doit's dependencies to handle that. But then, that was just a suggestion, I'm perfectly happy with EarlyTask
as it is now. Thanks!
Well, and of course, a Happy New Year to you all!
Probably too late but i will chime in anyway :)
doit has 2 distinct phases. task-creation and task-execution.
task-creation (generate task metadata) is usually done in doit using the task_xxx functions ina dodo.py module. But in Nikola it is done through the Nikola site object. Since in Nikola you can add more task-creators through plugins LateTask
was created to make sure the Nikola site object was set-up before executing task-creators from plugins...
So LateTask
means late-generated-task not late-executed-task. LateTask
is nikola concept and doit
has no idea about it.
task-execution ordering must be handled by doit using the task property task_dep
.
A "priority system" may make sense for Nikola since the global Nikola site object is modified by many plugins, but it doesnt make sense for doit
itself.
What do you think? Would it be hard to modify DoitNikola
(which inherits from DoitMain
) and NikolaTaskLoader
(inherited from TaskLoader
) to get the following behavior:
build
), the task loader will behave as it is now, i.e. generate tasks for early, then for usual, then for late, and return a big list containing all tasks.build
command behaves differently, by first creating only the early tasks and running them all, then creating the usual tasks, running them all, and finally creating the late tasks, and running them all.
run
method of DoitMain
three times, each time with a different task loader (one only giving the early tasks, one only giving the usual tasks, etc. -- it could also be the same task loader with a enum ALL
, EARLY
, USUAL
, LATE
)DoitMain
objects, one for each set of tasks.What do you think?
Cheers, Felix
@felixfontein it's perhaps cleaner and less work to create three "metatasks" and have mt1 have all EarlyTasks in its task_dep, then have mt2 have all Tasks and mt1 on its task_dep and then have mt3 have all LateTasks and mt2 in its task_dep
It can be trivially extended to an enumeration etc.
I’m reverting the current failed implementation of EarlyTask in 0ce8d72.
@ralsina: That does not solve the problem that during creation of the usual tasks, the generated posts haven't been created yet (and so no tasks to handle them, i.e. to render their pages, include them in indices and tag pages, etc. can be created resp. are created incorrectly). For that, the tasks can only be created when the previous tasks have been completely processed.
@felixfontein if a task needs another task to be executed to get the information it needs, then it should get that information on runtime, not on creation time. It can even obtain that information from the earlier task using something like http://pydoit.org/dependencies.html#calculated-dependencies
Or maybe something like http://pydoit.org/dependencies.html#getargs
The problem is that at creation time of the page compilation tasks, they have no idea if (or which) new posts are created. But the tasks (just to compile the post to .html, for example) must be created at that point, for the specific posts which will be created (and whose names are not known to Nikola at that point).
Or, maybe, what should happen is that the post scanning should be a task, which it's not.
Even if post scanning is a task, you have a problem since after running that task, new tasks have to be created.
Yes, there is a large semantic hole here. What we need is to figure out a way (which probably involves some largish changes in Nikola) so that tasks can affect "the future".
IOW:
1) a task modifies the timeline by creating a post 2) that means post scanning has to happen 3) that means more tasks have to be created for the new timeline
If we can work out how to do that in a clean manner, the rest will just work. Of course this is not trivial or it would be there already :-)
An ideal workflow would look like:
EarlyTask
plugins,EarlyTask
plugins,Task
plugins,LateTask
plugins,Task
and LateTask
plugins.(The last three steps can also be split into two runs: first create and execute Task
tasks, then create and execute LateTask
tasks.)
Well, by having more than one "create and then execute tasks" steps, this can be done in a quite simple way.
Until we need to have a task that runs between EarlyIshTasks and NotSoEarlyTasks
Ok, I give up, let's do that separation and multiple runs.
Well, that's why I proposed having a priority system, where priority indicates in which batch the tasks are created and executes. The current system (including EarlyTask
s) would have three priorities.
I think Nikola running doit three (or more) times will be too messy, and will create many other problems. I suggest to do some work on doit before trying to implement this on nikola.
First lets try to come up with a simple example that does not use nikola code base but has a similar workflow. Then change doit to handle that, and finally use it on nikola...
Can you give me some hints what might go wrong? Are there certain things in doit that should not be used/initialized more than once?
Ok, I created a small test program: https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py
It creates (and destroys pre-existing) files '1' and '2' and a directory 'dest', so be a bit cautious when running it.
On the first glance, seems to work well.
Can you give me some hints what might go wrong?
- try executing only one task. you will need to some pre analysis of available tasks or error handling.
- task_dep: you will need to make sure to only use tasks on the same level (on different level they will be managed by nikola itself)
- clean command: also will need some changes because it is executed in reverse order.
- auto command (original one from doit): i guess would need a lot of work
- output: as of today doit and nikola output is very simple so no problem. but it will be an issue if there is a summary of execution at the end (It is on my TODO list)
- task parameters
getargs
,setup
,teardown
will all have issues (it seems nikola doesnt use any of this but not nice to make it permanently unavailable.)- doit at the moment has very poor tooling but that may change in the future, this approach will probably make nikola not be able to use those.
Ok, I created a small test program: https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py
It is cool you could do that given doit limitations... I also want to give it a try changing doit, but it wont be as fast as you :) do you want to help me?
@felixfontein so I created a proof of concept. (https://github.com/pydoit/doit/compare/delayed-task-creation)
I created an issue on doit tracker for this: pydoit/doit#14 . It is still WIP but can you take a look at it and check if this would solve your problem with nikola.
If we group tasks into "execution groups" (or however that should be called) (they should be totally ordered, so it is clear which to execute first, etc.), as I did in that test implementation, I think it should be not too complicated to also integrate that into doit.
task_dep: you will need to make sure to only use tasks on the same level (on different level they will be managed by nikola itself)
Here it would be simplest to assume that each execution group is independent of each other: then the user has to take care of that by himself.
(Of course, one could make this more comfortable later on, by checking for dependencies between groups, and warning when they are not met.)
clean command: also will need some changes because it is executed in reverse order.
That shouldn't be complicated: by concatenating the list of files to clean for all execution groups, one could use this list (or its reverse) for cleaning.
auto command (original one from doit): i guess would need a lot of work
One could either start with the smallest execution group affected by the file change, or (more simply) just run everything (doit run
) if any dependency changed.
output: as of today doit and nikola output is very simple so no problem. but it will be an issue if there is a summary of execution at the end (It is on my TODO list)
That depends on how this will be implemented in doit. If the execution groups are handled by doit, doit could either do this per execution group, or over everything.
task parameters getargs, setup, teardown will all have issues (it seems nikola doesnt use any of this but not nice to make it permanently unavailable.)
Again, by first restricting everything to one execution group, it should be not too hard to implement this. One could still extend this later.
doit at the moment has very poor tooling but that may change in the future, this approach will probably make nikola not be able to use those.
That's a quite serious point, and the main reason why I'd prefer support from doit for this :) I assume that this is the more general framework into which things like displaying summary at the end fall?
I guess the main question is: how complicated do we/you want to make it? :) When limiting everything to adding execution groups, it should be not too complicated to implement support for them. (Essentially what I did in the test, with some more stuff here and there.)
The approach you want to chose is a bit more compliated, but should be doable. I'll take a closer look at your branch during the next days...
It is cool you could do that given doit limitations... I also want to give it a try changing doit, but it wont be as fast as you :) do you want to help me?
I'll try to help!
The approach you want to chose is a bit more compliated, but should be doable. I'll take a closer look at your branch during the next days...
Maybe my approach is more complicated to implement but it is the Right Thing :tm:
And the most complicated part was already done long time ago when doit added support to calc_dep
...
As of the code base now I am sure it will be easier to implement on doit than on nikola. I also expected to add this feature soon or later, so regardless nikola uses it or not it will be added to doit.
It also brings the possibility of some extra benefits as this is the foundation to avoid task-creation of all tasks when they are not required. I.e. nikola implemented several stuff (serve, check, deploy...) as commands to avoid the costly scan-posts.
Yes, it's the Right Thing :-)
Nikola's check
does need to scan the posts to know which files to inspect or which files shouldn't be there (because they're not created by a task) -- even though it does so only by calling nikola list --all
internally.
doit part to handle this is ready :) already merged to master... https://github.com/pydoit/doit/commit/b06eb8ebe0d5265a6221900f80905aaf0868d217
The change above includes docs but since nikola doesnt use dodo.py it is required to use some undocumented internal API...
Where is the plugin that requires this change? I guess the best way to integrate it on nikola is:
1) create a task with no actions like "pre_build" 2) the creation of build tasks should be delayed to be done after "pre_build" is executed 3) plugins will need to inject themselves as a task_dep of "pre_build".
Cool!
As a first experiment, I'd suggest to use this to actually generate LateTask
tasks after Task
tasks have been executed. When adding EarlyTask
s (or whatever else), it can be added exactly the same way.
The main part happens in nikola/__main__.py
, in the class NikolaTaskLoader
:
class NikolaTaskLoader(TaskLoader):
"""custom task loader to get tasks from Nikola instead of dodo.py file"""
def __init__(self, nikola, quiet=False):
self.nikola = nikola
self.quiet = quiet
def load_tasks(self, cmd, opt_values, pos_args):
if self.quiet:
DOIT_CONFIG = {
'verbosity': 0,
'reporter': 'zero',
}
else:
DOIT_CONFIG = {
'reporter': ExecutedOnlyReporter,
'outfile': sys.stderr,
}
DOIT_CONFIG['default_tasks'] = ['render_site', 'post_render']
tasks = generate_tasks(
'render_site',
self.nikola.gen_tasks('render_site', "Task", 'Group of tasks to render the site.'))
latetasks = generate_tasks(
'post_render',
self.nikola.gen_tasks('post_render', "LateTask", 'Group of tasks to be executed after site is rendered.'))
signal('initialized').send(self.nikola)
return tasks + latetasks, DOIT_CONFIG
So I guess that we have to do the following:
tasks
to be dependencies of a new task, late_build
, who does nothing;late_build
is executed (i.e. everything else is done), latetasks
is generated and its tasks are executed.So something along:
...
tasks = generate_tasks(
'render_site',
self.nikola.gen_tasks('render_site', "Task", 'Group of tasks to render the site.'))
tasks.extend(
generate_tasks('late_build',
{ 'basename': 'late_build',
'task_deps': tasks })
tasks.extend(
generate_tasks('late_build',
{ 'basename': 'late_build_create',
'create_after': 'late_build',
...
# latetasks = generate_tasks(
# 'post_render',
# self.nikola.gen_tasks('post_render', "LateTask", 'Group of tasks to be executed after site is rendered.'))
... })
signal('initialized').send(self.nikola)
return tasks, DOIT_CONFIG
Just a heads up for whoever tries: This may break "nikola check"
Actually, nikola check
should not be affected, since for all commands except build
/run
, simply all tasks will be created. Or at least that's what I remember from schettino72's code.
If that's so, even better, just a thing to check :-)
I started implementing here: https://github.com/getnikola/nikola/tree/earlytask_impl So far I'm mostly preparing stuff, the aim is to use the new doit features similarly as in https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py.
In the first step, I also moved post scanning into an own plugin. I've also changed what I called 'level' in that test to 'stage'; I hope that's less confusing.
The basic stuff works, I think: build
, clean
, list
(and thus check
), ...
What does not work yet: if you specify a specific task/target with build
which isn't in the first stage created. Also, it is not clear when the initialized
signal should be emitted, since there is not any longer a point when all tasks are created but none is executed.
For initialized
: what about having a signal for every stage? I.e. when the tasks for stage n have been created, emit the signal initialized_n
(with nicer names for special n, like -10, 1, 10, 100, corresponding to the phases early, post scanning, site rendering, and late). The classic initialized
signal could be emitted for n = 10, i.e. after generation of the usual site rendering tasks.
Is anyone actually using that signal at the moment?
For the other problem (specifying tasks/targets on command line): I don't see a good solution here which is not somewhat hacky.
The simplest solution would be to create tasks for all stages until all targets and tasks mentioned in the command line are created, and then let doit decide which tasks to execute. (default_tasks isn't of relevance anymore at this time.)
Has anyone an opinion on this? @schettino72 maybe?
Also, another thing: running in parallel doesn't work anymore out of the box, since multiprocessing doesn't work well with delayed task creation. But if you use threads instead of processes (by specifying -P thread
), it will run fine in parallel.
link to felix's branch: https://github.com/getnikola/nikola/compare/earlytask_impl
Felix Can you squash your commits? It makes easier to write comments. Now I need to find which commit did what change and the comments would be spread out...
For the other problem (specifying tasks/targets on command line): I don't see a good solution here which is not somewhat hacky.
I guess the problem is that Nikola.gen_tasks()
generate all tasks from a given plugin stage. It should be modified in a way that you have a function to create tasks from a given plugin instance (not from a plugin stage). So on NikolaTaskLoader._gen_stage()
you would create a task for every plugin instance and not only one task for each stage.
I am not 100% convinced about this usage of "stages"... why not just directly depend on task/plugin name?
Also, another thing: running in parallel doesn't work anymore out of the box, since multiprocessing doesn't work well with delayed task creation.
The problem is usage of closures (or whatever that can not be pickled) with multiprocessing. There is no real reason to use closures (apart for small convenience of the code location). Someone could just refactor nikola code to avoid closures and multiprocessing would work fine with DelayedTasks.
It would be nice to create tasks for individual task names, but the problem is that you don't know which basenames a plugin will emit before asking it to create its tasks. (The galleries plugin usually outputs render_galleries
tasks, but also _render_galleries_clean
tasks; the sitemap plugin outputs sitemap
(excepted) and _scan_locs
tasks, and the gzip plugin outputs names depending on which task it processes -- that's somewhat different though, since its a TaskMultiplier
plugin. And then there might be 3rd party plugins which might rely on that, too.)
@Kwpolska, @ralsina, what do you think? Would it be ok to make the assumption that that plugin.name
is the only task name callable via nikola build <taskname>
?
A second thing: what happens if you use doit run bla
, where bla
is a target of a task which is generated on the fly? It doesn't work with doit (I tested with your example: https://gist.github.com/schettino72/9868c27526a6c5ea554c), and so won't work with Nikola either. This could only be fixed by some kind of hack
Regarding avoiding pickling/enabling multiprocessing: the main culprit here is config_changed
, I think. Avoiding that will be hard to impossible without rewriting a lot of code, including 3rd party code, I think.
Hi,
I'd like to have a plugin category
EarlyTask
, for tasks which are executed before the site is rendered (i.e. an analogue toLateTask
). I personally need that for a plugin (or better, combination of plugins) I wrote, currently I used theTask
plugin class but it happens that some tasks are run after page compiling, while my page compiling plugin needs their result -- and so it fails.Does anyone mind if I add something like that? Or would it be better to have a general priority system, so you can assign a task a priority (usual tasks could get 10, and late tasks 100, so you could add a task with priority 2 and one with priority 7 to ensure that the one with priority 2 appears in the task list before the one with priority 7 and before all regular rendering tasks and late tasks)?
(I have the vague feeling that I already read something about
EarlyTask
s somewhere here, but I cannot remember where. So it's probably not my own idea :) )Cheers, Felix