Implement Tasks - Githubissues

viluon commented 7 years ago

Okay maybe we didn't think it through enough. So what?

The idea is simple, rather than doing whatever it is we are doing now (I'm not trying to pretend that I understand any part of the system whatsoever), "tasks" should tell us what to do with a file (are you asking how is it going to work with the existing plugin callbacks? I don't know either).

So a build has different tasks it has to do before successfully ending:

include --> minify --> package --> upload

This is always a linked list, not a dependency tree or a cursed bidirectional graph. Nope. Keeping things simple should be our new motto.

Each task feeds into the next, which makes it look like the whole thing should be called a pipeline rather than a "build". That's not the case, at least not at the moment. A build has tasks it runs on pipelines, one pipeline is spawned for every processed file.

The Terrible Tasks to Tackle

I think we can get inspired by Howl. What should we call the part which processes a stream and all its directives though? Howl calls that task "build". Yeah. Damn it.

exerro commented 7 years ago

Ultimately, for each input file, we need to take the content, do stuff with it, poop out a Lua AST, then merge all the different ASTs together with some extra magic.

I think we've got the file lookup functionality sorted already. Protocols, plugins, that's all neat. After that, you make a pipeline with the URI and mode generated by the file lookup. I think there are now the following stages to the pipeline:

Preprocess - do stuff with the entire file's content, if necessary. Then directives and a few callbacks like ignore handled by the lexer.
Parse - a single function called to parse using the lexer generated, generating an AST. One per file mode?
AST modification - plugins get to mess with the AST however they like. Want to replace a node? No problemo. Want to remove a node? Sure thing. Want to add a load of code in? Why not.

After this I'm not so sure. Previously, the plan was to compile to Lua and poop out the result, but that wasn't considering that there would be many files. The pipeline is pretty exclusive to just one file. What about a plugin that checked for global references? How would it know if it was referring to a definition in another file or something. It gets complicated here, but I'd suggest some way of turning (list of pipelines) into (one pipeline) and having callbacks in the process and after the process. Then compiling the AST to Lua.

viluon commented 7 years ago

Yeah okay, merging files should happen after AST modification. (Speaking of which, plugins should have a means of using the parser that pooped out the AST to add nodes of their own (best if such a method would be also available overloaded on the AST itself, like node:add_child "foo( 'this gets compiled during AST modification' )").)

How to call the pipeline merging process and therefore also the callbacks is just a matter of convention. pipeline.merge_pipeline_before/after_callbacks? I don't really know, just making this up as I go along.

Regarding the global reference plugin, while it is a neat idea I'd argue that its success rate very much depends on where it is imported to, and since that can be multiple locations, I wouldn't worry about it now (the plugin itself can define a directive to specify which variables it should ignore, etc).

exerro commented 7 years ago

Oh yeah definitely, although I'd argue that some of the same AST callbacks are invoked before and after the merge. Some (notably optimisations) would benefit from being done after the merge, as well as before.

I think it'd be more like node:add_child( pipeline.parser:parse_expression( "blahdyblah" ) ), but yeah that's a good idea. I also think there should be a simple way or remapping specific variables so you could do something like parse( "f( x )" ):replace( "x", myexpr ).

It's more the functionality of the callbacks that I'm concerned about. Also, how will inter-file dependencies work? If I want to include a library, will it use the standard, local lib = {}; ...; return lib? If so, should the include syntax look as follows: --@include X as Y? I really don't know about this one. In v1.0, there was a way to localise certain variables at the top, and then it could just define them, with the code being stuck in the middle of other files. Idk. Suggest things.

The global reference plugin is one of those ~essential ones imo. It's an example showing the ability to use what I think is a powerful and important ability of the build system.

viluon commented 7 years ago

The more metamethods, overloads, and shortucts there are, the better. Does it hurt if the AST has a reference to its parser? (In the case we're talking about, it doesn't actually need to access the relevant pipeline if it can get the parser reference from elsewhere.)

I guess it could be smart about libraries that do return foo at the end, but error if @include foo as bar doesn't succeed. The as * part should definitely be optional.

I was actually thinking about it as a real time linter. Nevermind. I'd implement this by simply running the plugin only after all pipelines have been merged. That way it'll report on the actual global references.

exerro commented 7 years ago

I tend to go for a minimal AST so that it can be serialised easily. It also makes it easier to do things like return { type = "Blah", value = blah, .... } but I guess that's bad practise anyway. I think it's just the bloat that I don't like. However, the AST really is just a storage of the syntax of the file. It's not really related to the parser. Ultimately, everything will end up being Lua, so would the AST have its parser pointing to the Lua parser, or the parser that generated that Lua AST? (e.g. MoonScript or something) Plugins would be the things doing the AST modification, and those would have access to the pipeline or some object that referenced the parser, I'm sure. Or it could just directly refer to the parser. See below, regarding inter-plugin communication.

How would that not succeed? Also, "be smart" sounds scary, unplanned, and hacky. This needs to have really clear and defined behaviour. I'd suggest that all files are wrapped in a local __unique_identifier_1 = (function() ... end), with another unique identifier used to store the result. In the place of the @include in whatever file, it ensures the dependency has been loaded already, then copies the latter unique identifier to the local import name. So...

--@include myfile.lua as myfile

__ID_3 = (function()
    if not __ID_2 then
        __ID_2 = __ID_1() or true
    end
    local myfile = __ID_2
end)
__ID_3() -- called as it's an entry point

My example of a global reference lookup thingy was bad. Basically, I dislike how plugins operating on the file pipeline won't be able to know anything about other files/pipelines. It might not be an issue, I guess we'll run into it later on if it is.

Inter-plugin communication

Plugins should be able to talk to eachother and share features. I think, given that multiple file modes will be supported (say, for different languages), plugins themselves should register the parser for that mode. This will make it much much easier to have precompiled packages later on (see #6). As parsers are implemented by a plugin, other plugins should be able to refer to that plugin and 'ask' it to parse things. There are undoubtedly many other occurrences where exposing public methods will be important. So, a plugin will have a private state, and public set of methods, for any stage in the pipeline.

exerro commented 7 years ago

Note, we really need to work on the terminology, I'm getting really confused already.

viluon commented 7 years ago

Good point with the AST scoping issues, I didn't think of that. Let's leave the parser link in the pipeline then.

"Be smart" is certainly none of those. Basically, if the file ends in a return statement with a single child that is also not a constant, it uses that child as the library:

-- foolib.lua
local foolib = {}

function foolib.bar()
    print "asdfmovie rulez"
end

return foolib

-- unicorn_land.lua
-- @include foolib.lua as greatness

greatness.bar()

-- unicorn_poop.lua
-- IR identical to the compiled output of unicorn_land.lua
local greatness = {}

function greatness.bar()
    print "asdfmovie rulez"
end

greatness.bar()

In case unicorn_land.lua's as greatness fails (foolib.lua puts functions into the global scope, for example), an error is raised. If the @include clause were only -- @include foolib.lua, the library would be wrapped in a function and included nonetheless. Is that scary, unplanned, or hacky? I think it's pretty intuitive.

Inter-plugin communication would be nice, but do we actually need it? I don't know how to structure the plugin API for that either... I think we should implement this on-demand, anyway, that should help us understand the issue from a plugin-developer's perspective.

exerro commented 7 years ago

I made a flowchart illustrating how this should all work. https://drive.google.com/file/d/0B37gxkuD6apmS3lvLXFFUHRQbnM/view?usp=sharing

Anyway, the issue with your example is the fact that many different files may try to include, say util, and potentially as different things, --@include util.lua as _util, --@include util.lua as utils, etc. This means you can't just rename foolib in your case. However, it could be an optimisation applied by a plugin to see that greatness always referred to foolib and thus replace references of greatness with foolib. This also prevents some file doing utils = {} and messing up every other file that included the util file.

I think @include X (no as Y) should default to having X determine the name to import it as. For example, @include X.lua would implicitly function as @include X.lua as X. There's potential for things like @include -g X.lua for including it as a global-defining file or something.

One issue I see with this is that a file's type is dependent on the file, not how it's included. So you'd be able to do @include -g X.lua and @include X.lua, and that sucks. Maybe just a universal @include X.lua and then something like @filemode "library" inside X.lua to determine how it's loaded/inserted into various environments.

Inter-plugin communication should be pretty easy. pipeline.plugins.PluginName.func()? I guess if we have plugin names something like namespace.plugin (e.g. lua.linter, moonscript.linter), that would need more processing, but using _ for namespaces or just doing wizardry to allow pipeline.plugins.namespace.PluginName.func() are both possible.

Let's seriously review the terminology soon. I'll make an issue in a sec.

viluon commented 7 years ago

the issue with your example

I suspect you mean this would cause issues when multiple files renaming a resource were a part of a single compiled output? I wanted to make the example readable, so I didn't apply optimisations and unique identifiers (and instead included the comment saying that while the code differs from the compiled output, its meaning is identical), which would of course be present in the final result. What I wanted to stress out was the absence of a function wrapper in these cases.

exerro commented 7 years ago

Ohkay. So similar to what I said but without the wrapper? I'd agree but some files might be awkward, like

if boop then
    return blah
end
return bleh

Although trivial for a person (or even a decent code analyser) to handle, the (hopefully quite simple/basic) file merger wouldn't be able to handle that, and a function wrapper would be necessary. Having a separate way of handling files ending in a single return statement can come later. Having a wrapper handles any situation.

What did you think of @filemode and the inter-plugin communication suggestions?

exerro commented 7 years ago

I'm thinking of having something similar to the following:

local t = Task:new( "Parse" ):on( "Stream" ):returns( "AST" ):precedes( "MacroExpand" ):follows( "Preprocess" ):does( lua.parse )
plugin:register( t )

A pipeline would be a composition of tasks. However, there needs to be a looping mechanic where e.g. the AST could be modified and perform previous tasks (e.g. optimisations) on the new AST. I guess the pipeline could be copied from the first point that accepts an AST and run with the new data until the last point it returns an AST? So it's just the changed node that is re-run?

viluon commented 7 years ago

The suggested inter-plugin comms seem fine (which may be just because I don't know how I'd put them to a good use, tbh, but I don't see any issues with your proposal).

@filemode feels kinda hacky... Other Lua build systems and preprocessors simply inline the included files (frequently they literally do just that). Also I wouldn't expect the user not to read a tutorial on how to include libraries, so is it really necessary?

You're right, a function wrapper is guaranteed to work in either case.

exerro commented 7 years ago

Okay, I guess we'll roll with that. What namespace naming convention should we go with?

@filemode could default to something sensible, but it does give more control (and in the right place, imo).

exerro / amend

Implement Tasks #14

The Terrible Tasks to Tackle

Inter-plugin communication