getnikola / nikola

A static website and blog generator
https://getnikola.com/
MIT License
2.62k stars 448 forks source link

```EarlyTask``` plugins #1562

Closed felixfontein closed 7 years ago

felixfontein commented 9 years ago

Hi,

I'd like to have a plugin category EarlyTask, for tasks which are executed before the site is rendered (i.e. an analogue to LateTask). I personally need that for a plugin (or better, combination of plugins) I wrote, currently I used the Task plugin class but it happens that some tasks are run after page compiling, while my page compiling plugin needs their result -- and so it fails.

Does anyone mind if I add something like that? Or would it be better to have a general priority system, so you can assign a task a priority (usual tasks could get 10, and late tasks 100, so you could add a task with priority 2 and one with priority 7 to ensure that the one with priority 2 appears in the task list before the one with priority 7 and before all regular rendering tasks and late tasks)?

(I have the vague feeling that I already read something about EarlyTasks somewhere here, but I cannot remember where. So it's probably not my own idea :) )

Cheers, Felix

Kwpolska commented 9 years ago

1553

felixfontein commented 9 years ago

Yes, that was it. Thanks for linking it! So the main question is probably: just add another category, or add a simple priority system so that there's only one category (Task, with LateTask being a special case of Task with higher priority)?

felixfontein commented 9 years ago

(Ah, I forgot the fine prints: the EarlyTask plugins should run before any posts are scanned. For my case, that doesn't matter, but maybe someone wants something to be run after posts are scanned but before posts are rendered. A general priority system could run everything with negative priority numbers before posts are scanned, and everything else afterwards, and rendering the site could have priority 10 or so so that it is possible to squeeze something between post scanning and site rendering.)

Kwpolska commented 9 years ago

Creating a category is a total of five lines of code across two files (class EarlyTask(Task): pass in plugin_categories.py; a modified import and plugin load directive in nikola.py).

A priority system would be a huge mess, would require a lot of changes everywhere and giving out actual priorities to each and every task to make it flexible.

I’m pretty sure scan_posts is called by the first Task that gets loaded and wants it.

And I just created EarlyTask.

punchagan commented 9 years ago

Shouldn't EarlyTasks be added to the default tasks loaded by the NikolaTaskLoader?

Kwpolska commented 9 years ago

@punchagan Probably. I'll do it tomorrow.

Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:08 PM, "Puneeth Chaganti" notifications@github.com wrote:

Shouldn't EarlyTasks be added to the default tasks https://github.com/getnikola/nikola/blob/master/nikola/__main__.py#L243 loaded by the NikolaTaskLoader?

— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68469306.

punchagan commented 9 years ago

@Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it. (Tomorrow).

Happy New Year! :fireworks:

Kwpolska commented 9 years ago

Where is it?

Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:38 PM, "Puneeth Chaganti" notifications@github.com wrote:

@Kwpolska https://github.com/Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it.

— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68470658.

Kwpolska commented 9 years ago

The loader should have been fixed. I managed to do it in 2014 and on my phone.

Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:39 PM, "Chris Warrick" kwpolska@gmail.com wrote:

Where is it?

Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 10:38 PM, "Puneeth Chaganti" notifications@github.com wrote:

@Kwpolska https://github.com/Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it.

— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68470658.

punchagan commented 9 years ago

Woops, didn't see your question. I had referenced the PR in this commit, but no email notifications for them. :)

Thanks!

Kwpolska commented 9 years ago

You should've posted the commit sha here, that way I would notice. Either way, it's now solved (and our solutions are the same in code with just a small difference when it comes to English)

Chris Warrick https://chriswarrick.com/ Sent from my Galaxy S3. On Dec 31, 2014 11:08 PM, "Puneeth Chaganti" notifications@github.com wrote:

Woops, didn't see your question. I had referenced the PR in this commit, but no email notifications for them. :)

Thanks!

— Reply to this email directly or view it on GitHub https://github.com/getnikola/nikola/issues/1562#issuecomment-68471926.

punchagan commented 9 years ago

Yep. Thanks!

felixfontein commented 9 years ago

Cool, everything's already done! :) A priority system won't be exactly complicated, since most things can have the same priority -- there are still doit's dependencies to handle that. But then, that was just a suggestion, I'm perfectly happy with EarlyTask as it is now. Thanks! Well, and of course, a Happy New Year to you all!

schettino72 commented 9 years ago

Probably too late but i will chime in anyway :)

doit has 2 distinct phases. task-creation and task-execution.

task-creation (generate task metadata) is usually done in doit using the task_xxx functions ina dodo.py module. But in Nikola it is done through the Nikola site object. Since in Nikola you can add more task-creators through plugins LateTask was created to make sure the Nikola site object was set-up before executing task-creators from plugins...

So LateTask means late-generated-task not late-executed-task. LateTask is nikola concept and doit has no idea about it.

task-execution ordering must be handled by doit using the task property task_dep.

A "priority system" may make sense for Nikola since the global Nikola site object is modified by many plugins, but it doesnt make sense for doit itself.

felixfontein commented 9 years ago

What do you think? Would it be hard to modify DoitNikola (which inherits from DoitMain) and NikolaTaskLoader (inherited from TaskLoader) to get the following behavior:

What do you think?

Cheers, Felix

ralsina commented 9 years ago

@felixfontein it's perhaps cleaner and less work to create three "metatasks" and have mt1 have all EarlyTasks in its task_dep, then have mt2 have all Tasks and mt1 on its task_dep and then have mt3 have all LateTasks and mt2 in its task_dep

It can be trivially extended to an enumeration etc.

Kwpolska commented 9 years ago

I’m reverting the current failed implementation of EarlyTask in 0ce8d72.

felixfontein commented 9 years ago

@ralsina: That does not solve the problem that during creation of the usual tasks, the generated posts haven't been created yet (and so no tasks to handle them, i.e. to render their pages, include them in indices and tag pages, etc. can be created resp. are created incorrectly). For that, the tasks can only be created when the previous tasks have been completely processed.

ralsina commented 9 years ago

@felixfontein if a task needs another task to be executed to get the information it needs, then it should get that information on runtime, not on creation time. It can even obtain that information from the earlier task using something like http://pydoit.org/dependencies.html#calculated-dependencies

ralsina commented 9 years ago

Or maybe something like http://pydoit.org/dependencies.html#getargs

felixfontein commented 9 years ago

The problem is that at creation time of the page compilation tasks, they have no idea if (or which) new posts are created. But the tasks (just to compile the post to .html, for example) must be created at that point, for the specific posts which will be created (and whose names are not known to Nikola at that point).

ralsina commented 9 years ago

Or, maybe, what should happen is that the post scanning should be a task, which it's not.

felixfontein commented 9 years ago

Even if post scanning is a task, you have a problem since after running that task, new tasks have to be created.

ralsina commented 9 years ago

Yes, there is a large semantic hole here. What we need is to figure out a way (which probably involves some largish changes in Nikola) so that tasks can affect "the future".

IOW:

1) a task modifies the timeline by creating a post 2) that means post scanning has to happen 3) that means more tasks have to be created for the new timeline

If we can work out how to do that in a clean manner, the rest will just work. Of course this is not trivial or it would be there already :-)

felixfontein commented 9 years ago

An ideal workflow would look like:

(The last three steps can also be split into two runs: first create and execute Task tasks, then create and execute LateTask tasks.)

felixfontein commented 9 years ago

Well, by having more than one "create and then execute tasks" steps, this can be done in a quite simple way.

ralsina commented 9 years ago

Until we need to have a task that runs between EarlyIshTasks and NotSoEarlyTasks

ralsina commented 9 years ago

Ok, I give up, let's do that separation and multiple runs.

felixfontein commented 9 years ago

Well, that's why I proposed having a priority system, where priority indicates in which batch the tasks are created and executes. The current system (including EarlyTasks) would have three priorities.

schettino72 commented 9 years ago

I think Nikola running doit three (or more) times will be too messy, and will create many other problems. I suggest to do some work on doit before trying to implement this on nikola.

First lets try to come up with a simple example that does not use nikola code base but has a similar workflow. Then change doit to handle that, and finally use it on nikola...

felixfontein commented 9 years ago

Can you give me some hints what might go wrong? Are there certain things in doit that should not be used/initialized more than once?

felixfontein commented 9 years ago

Ok, I created a small test program: https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py

It creates (and destroys pre-existing) files '1' and '2' and a directory 'dest', so be a bit cautious when running it.

On the first glance, seems to work well.

schettino72 commented 9 years ago

Can you give me some hints what might go wrong?

  • try executing only one task. you will need to some pre analysis of available tasks or error handling.
  • task_dep: you will need to make sure to only use tasks on the same level (on different level they will be managed by nikola itself)
  • clean command: also will need some changes because it is executed in reverse order.
  • auto command (original one from doit): i guess would need a lot of work
  • output: as of today doit and nikola output is very simple so no problem. but it will be an issue if there is a summary of execution at the end (It is on my TODO list)
  • task parameters getargs, setup, teardown will all have issues (it seems nikola doesnt use any of this but not nice to make it permanently unavailable.)
  • doit at the moment has very poor tooling but that may change in the future, this approach will probably make nikola not be able to use those.

Ok, I created a small test program: https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py

It is cool you could do that given doit limitations... I also want to give it a try changing doit, but it wont be as fast as you :) do you want to help me?

schettino72 commented 9 years ago

@felixfontein so I created a proof of concept. (https://github.com/pydoit/doit/compare/delayed-task-creation)

I created an issue on doit tracker for this: pydoit/doit#14 . It is still WIP but can you take a look at it and check if this would solve your problem with nikola.

felixfontein commented 9 years ago

If we group tasks into "execution groups" (or however that should be called) (they should be totally ordered, so it is clear which to execute first, etc.), as I did in that test implementation, I think it should be not too complicated to also integrate that into doit.

task_dep: you will need to make sure to only use tasks on the same level (on different level they will be managed by nikola itself)

Here it would be simplest to assume that each execution group is independent of each other: then the user has to take care of that by himself.

(Of course, one could make this more comfortable later on, by checking for dependencies between groups, and warning when they are not met.)

clean command: also will need some changes because it is executed in reverse order.

That shouldn't be complicated: by concatenating the list of files to clean for all execution groups, one could use this list (or its reverse) for cleaning.

auto command (original one from doit): i guess would need a lot of work

One could either start with the smallest execution group affected by the file change, or (more simply) just run everything (doit run) if any dependency changed.

output: as of today doit and nikola output is very simple so no problem. but it will be an issue if there is a summary of execution at the end (It is on my TODO list)

That depends on how this will be implemented in doit. If the execution groups are handled by doit, doit could either do this per execution group, or over everything.

task parameters getargs, setup, teardown will all have issues (it seems nikola doesnt use any of this but not nice to make it permanently unavailable.)

Again, by first restricting everything to one execution group, it should be not too hard to implement this. One could still extend this later.

doit at the moment has very poor tooling but that may change in the future, this approach will probably make nikola not be able to use those.

That's a quite serious point, and the main reason why I'd prefer support from doit for this :) I assume that this is the more general framework into which things like displaying summary at the end fall?

I guess the main question is: how complicated do we/you want to make it? :) When limiting everything to adding execution groups, it should be not too complicated to implement support for them. (Essentially what I did in the test, with some more stuff here and there.)

The approach you want to chose is a bit more compliated, but should be doable. I'll take a closer look at your branch during the next days...

It is cool you could do that given doit limitations... I also want to give it a try changing doit, but it wont be as fast as you :) do you want to help me?

I'll try to help!

schettino72 commented 9 years ago

The approach you want to chose is a bit more compliated, but should be doable. I'll take a closer look at your branch during the next days...

Maybe my approach is more complicated to implement but it is the Right Thing :tm: And the most complicated part was already done long time ago when doit added support to calc_dep...

As of the code base now I am sure it will be easier to implement on doit than on nikola. I also expected to add this feature soon or later, so regardless nikola uses it or not it will be added to doit.

It also brings the possibility of some extra benefits as this is the foundation to avoid task-creation of all tasks when they are not required. I.e. nikola implemented several stuff (serve, check, deploy...) as commands to avoid the costly scan-posts.

felixfontein commented 9 years ago

Yes, it's the Right Thing :-)

Nikola's check does need to scan the posts to know which files to inspect or which files shouldn't be there (because they're not created by a task) -- even though it does so only by calling nikola list --all internally.

schettino72 commented 9 years ago

doit part to handle this is ready :) already merged to master... https://github.com/pydoit/doit/commit/b06eb8ebe0d5265a6221900f80905aaf0868d217

The change above includes docs but since nikola doesnt use dodo.py it is required to use some undocumented internal API...

Where is the plugin that requires this change? I guess the best way to integrate it on nikola is:

1) create a task with no actions like "pre_build" 2) the creation of build tasks should be delayed to be done after "pre_build" is executed 3) plugins will need to inject themselves as a task_dep of "pre_build".

felixfontein commented 9 years ago

Cool!

As a first experiment, I'd suggest to use this to actually generate LateTask tasks after Task tasks have been executed. When adding EarlyTasks (or whatever else), it can be added exactly the same way.

The main part happens in nikola/__main__.py, in the class NikolaTaskLoader:

class NikolaTaskLoader(TaskLoader):
    """custom task loader to get tasks from Nikola instead of dodo.py file"""
    def __init__(self, nikola, quiet=False):
        self.nikola = nikola
        self.quiet = quiet

    def load_tasks(self, cmd, opt_values, pos_args):
        if self.quiet:
            DOIT_CONFIG = {
                'verbosity': 0,
                'reporter': 'zero',
            }
        else:
            DOIT_CONFIG = {
                'reporter': ExecutedOnlyReporter,
                'outfile': sys.stderr,
            }
        DOIT_CONFIG['default_tasks'] = ['render_site', 'post_render']
        tasks = generate_tasks(
            'render_site',
            self.nikola.gen_tasks('render_site', "Task", 'Group of tasks to render the site.'))
        latetasks = generate_tasks(
            'post_render',
            self.nikola.gen_tasks('post_render', "LateTask", 'Group of tasks to be executed after site is rendered.'))
        signal('initialized').send(self.nikola)
        return tasks + latetasks, DOIT_CONFIG

So I guess that we have to do the following:

So something along:

        ...
        tasks = generate_tasks(
            'render_site',
            self.nikola.gen_tasks('render_site', "Task", 'Group of tasks to render the site.'))
        tasks.extend(
            generate_tasks('late_build',
                { 'basename': 'late_build',
                  'task_deps': tasks })
        tasks.extend(
            generate_tasks('late_build',
                { 'basename': 'late_build_create',
                  'create_after': 'late_build',
                  ...
#        latetasks = generate_tasks(
#            'post_render',
#            self.nikola.gen_tasks('post_render', "LateTask", 'Group of tasks to be executed after site is rendered.'))
                  ... })
        signal('initialized').send(self.nikola)
        return tasks, DOIT_CONFIG
ralsina commented 9 years ago

Just a heads up for whoever tries: This may break "nikola check"

felixfontein commented 9 years ago

Actually, nikola check should not be affected, since for all commands except build/run, simply all tasks will be created. Or at least that's what I remember from schettino72's code.

ralsina commented 9 years ago

If that's so, even better, just a thing to check :-)

felixfontein commented 9 years ago

I started implementing here: https://github.com/getnikola/nikola/tree/earlytask_impl So far I'm mostly preparing stuff, the aim is to use the new doit features similarly as in https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py.

In the first step, I also moved post scanning into an own plugin. I've also changed what I called 'level' in that test to 'stage'; I hope that's less confusing.

felixfontein commented 9 years ago

The basic stuff works, I think: build, clean, list (and thus check), ...

What does not work yet: if you specify a specific task/target with build which isn't in the first stage created. Also, it is not clear when the initialized signal should be emitted, since there is not any longer a point when all tasks are created but none is executed.

felixfontein commented 9 years ago

For initialized: what about having a signal for every stage? I.e. when the tasks for stage n have been created, emit the signal initialized_n (with nicer names for special n, like -10, 1, 10, 100, corresponding to the phases early, post scanning, site rendering, and late). The classic initialized signal could be emitted for n = 10, i.e. after generation of the usual site rendering tasks.

Is anyone actually using that signal at the moment?

felixfontein commented 9 years ago

For the other problem (specifying tasks/targets on command line): I don't see a good solution here which is not somewhat hacky.

The simplest solution would be to create tasks for all stages until all targets and tasks mentioned in the command line are created, and then let doit decide which tasks to execute. (default_tasks isn't of relevance anymore at this time.)

Has anyone an opinion on this? @schettino72 maybe?

felixfontein commented 9 years ago

Also, another thing: running in parallel doesn't work anymore out of the box, since multiprocessing doesn't work well with delayed task creation. But if you use threads instead of processes (by specifying -P thread), it will run fine in parallel.

schettino72 commented 9 years ago

link to felix's branch: https://github.com/getnikola/nikola/compare/earlytask_impl

Felix Can you squash your commits? It makes easier to write comments. Now I need to find which commit did what change and the comments would be spread out...

For the other problem (specifying tasks/targets on command line): I don't see a good solution here which is not somewhat hacky.

I guess the problem is that Nikola.gen_tasks() generate all tasks from a given plugin stage. It should be modified in a way that you have a function to create tasks from a given plugin instance (not from a plugin stage). So on NikolaTaskLoader._gen_stage() you would create a task for every plugin instance and not only one task for each stage.

I am not 100% convinced about this usage of "stages"... why not just directly depend on task/plugin name?

Also, another thing: running in parallel doesn't work anymore out of the box, since multiprocessing doesn't work well with delayed task creation.

The problem is usage of closures (or whatever that can not be pickled) with multiprocessing. There is no real reason to use closures (apart for small convenience of the code location). Someone could just refactor nikola code to avoid closures and multiprocessing would work fine with DelayedTasks.

felixfontein commented 9 years ago

It would be nice to create tasks for individual task names, but the problem is that you don't know which basenames a plugin will emit before asking it to create its tasks. (The galleries plugin usually outputs render_galleries tasks, but also _render_galleries_clean tasks; the sitemap plugin outputs sitemap (excepted) and _scan_locs tasks, and the gzip plugin outputs names depending on which task it processes -- that's somewhat different though, since its a TaskMultiplier plugin. And then there might be 3rd party plugins which might rely on that, too.)

@Kwpolska, @ralsina, what do you think? Would it be ok to make the assumption that that plugin.name is the only task name callable via nikola build <taskname>?

A second thing: what happens if you use doit run bla, where bla is a target of a task which is generated on the fly? It doesn't work with doit (I tested with your example: https://gist.github.com/schettino72/9868c27526a6c5ea554c), and so won't work with Nikola either. This could only be fixed by some kind of hack

felixfontein commented 9 years ago

Regarding avoiding pickling/enabling multiprocessing: the main culprit here is config_changed, I think. Avoiding that will be hard to impossible without rewriting a lot of code, including 3rd party code, I think.