erlang / rebar3

Erlang build tool that makes it easy to compile and test Erlang applications and releases.
http://www.rebar3.org
Apache License 2.0
1.7k stars 517 forks source link

Trying out the new custom compiler modules in a plugin - dependencies #2418

Closed seanhinde closed 3 years ago

seanhinde commented 4 years ago

Project

I'm building a plugin for caramel, which relies on the new module dependencies feature of custom compiler modules

Environment

~/Projects/caradeps$ rebar3 report compile
===> Analyzing applications...
===> Compiling rebar3_caramel
Rebar3 report
 version 3.14.1
 generated at 2020-11-03T09:23:57+00:00
=================
Please submit this along with your issue at https://github.com/erlang/rebar3/issues (and feel free to edit out private information, if any)
-----------------
Task: compile
Entered as:
  compile
-----------------
Operating System: x86_64-unknown-linux-gnu
ERTS: Erlang/OTP 23 [erts-11.1] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1]
Root Directory: /usr/local/lib/erlang
Library directory: /usr/local/lib/erlang/lib
-----------------
Loaded Applications:
bbmustache: 1.10.0
certifi: 2.5.2
cf: 0.3.1
common_test: 1.19
compiler: 7.6.3
crypto: 4.8
cth_readable: 1.4.8
dialyzer: 4.2.1
edoc: 0.12
erlware_commons: 1.3.1
eunit: 2.6
eunit_formatters: 0.5.0
getopt: 1.0.1
hipe: 4.0.1
inets: 7.3
kernel: 7.1
providers: 1.8.1
public_key: 1.9
relx: 4.0.2
sasl: 4.0.1
snmp: 5.6.1
ssl_verify_fun: 1.1.6
stdlib: 3.13.2
syntax_tools: 2.3.1
tools: 3.4.1

-----------------
Escript path: /usr/local/bin/rebar3
Providers:
  app_discovery as clean clean clean-all clean-build clean-cache compile compile compile cover ct deps dialyzer do edoc escriptize escriptize eunit get-deps help install install_deps list lock new path pkgs release release relup repl report repos run run-escript run-release shell state tar tree unlock update upgrade upgrade upgrade version versions xref

Current behaviour

What I see at the moment is that the dependencies don't change the compilation order. I haven't dug too deeply, but here is my dependencies callback:

dependencies(File, _Dir, SrcDirs) ->
    rebar_api:console("Deps src dirs: ~p", [SrcDirs]),
    SrcFiles = lists:append([src_files(SrcDir) || SrcDir <- SrcDirs]),
    Cmd = "caramelc sort-deps " ++ lists:join(" ", SrcFiles),
    rebar_api:console("Cmd: ~s", [Cmd]),
    {ok, Res} = rebar_utils:sh(Cmd, [abort_on_error]),
    rebar_api:console("Result: ~s", [Res]),
    SortedDeps = string:tokens(Res, " \r\n"),
    rebar_api:console("File: ~p", [File]),
    rebar_api:console("Sorted: ~p", [SortedDeps]),
    Deps = lists:takewhile(fun(D) -> D =/= File end, SortedDeps),
    rebar_api:console("Deps: ~p", [Deps]),
    Deps.

And the debug output:

$ rebar3 compile
===> Analyzing applications...
===> Compiling rebar3_caramel
===> Analyzing applications...
===> Compiling rebar3_caramel
Needed: []
===> Verifying dependencies...
===> Analyzing applications...
Deps src dirs: ["/home/sean/Projects/caradeps/src"]
Deps src dirs: ["/home/sean/Projects/caradeps/src"]
Cmd: caramelc sort-deps /home/sean/Projects/caradeps/src/caradeps_a.ml /home/sean/Projects/caradeps/src/caradeps_b.ml
Cmd: caramelc sort-deps /home/sean/Projects/caradeps/src/caradeps_a.ml /home/sean/Projects/caradeps/src/caradeps_b.ml
Result: /home/sean/Projects/caradeps/src/caradeps_a.ml /home/sean/Projects/caradeps/src/caradeps_b.ml 

File: "/home/sean/Projects/caradeps/src/caradeps_b.ml"
Sorted: ["/home/sean/Projects/caradeps/src/caradeps_a.ml",
         "/home/sean/Projects/caradeps/src/caradeps_b.ml"]
Deps: ["/home/sean/Projects/caradeps/src/caradeps_a.ml"]
Result: /home/sean/Projects/caradeps/src/caradeps_a.ml /home/sean/Projects/caradeps/src/caradeps_b.ml 

File: "/home/sean/Projects/caradeps/src/caradeps_a.ml"
Sorted: ["/home/sean/Projects/caradeps/src/caradeps_a.ml",
         "/home/sean/Projects/caradeps/src/caradeps_b.ml"]
Deps: []
Needed: ["/home/sean/Projects/caradeps/src/caradeps_b.ml",
         "/home/sean/Projects/caradeps/src/caradeps_a.ml"]
===> sh(/home/sean/.opam/default/bin/caramelc compile /home/sean/Projects/caradeps/src/caradeps_b.ml)
failed with return code 1 and the following output:
Unbound module Caradeps_a

So for file caradeps_b.ml I'm returning [caradeps_a.ml] as dependencies (because caradeps_b.ml tries to call a function in caradeps_a.ml), and caradeps_a.ml dependencies are [].

What I see is that rebar3 first tries to compile caradeps_b.ml, which correctly fails on the missing dependency

I also tried swapping the deps return values around to check my understanding - same result, caradeps_b.ml is tried first.

Expected behaviour

Dependencies are compiled first

Would be great to have another pair of eyes on this, thanks.

ferd commented 4 years ago

The dependencies are used to build the DAG, which is in turn used to propagate changes and timestamps across files. It is not used to define what needs to be compiled or when at this point in time, just to update all the non-artifact vertices in the graph and later allow proper file analysis. So if file A depends on B and B changes, we will update the last modified timestamp of A to be the same as B.

The actual scheduling and ordering of files to be built is handled in the needed_files/4 callback (as documented in http://rebar3.org/docs/extending/custom_compiler_modules/), which is where it is expected you do the analysis of the files:

%% do your own analysis aided by the graph to specify what needs re-compiling.
%% You can use this to add more or fewer files (i.e. compiler options changed),
%% and specify how to schedule their compilation. One thing we do here for
%% erlang files is look at the digraph to only rebuild files with newer
%% timestamps than their build artifacts (which are also in the DAG after the
%% first build) or those with compiler options that changed (the
%% compile_and_track callback lets you annotate artifacts)
needed_files(G, ["/path/to/all/files.erl", ...], [{".beam", "/path/to/ebin"}], AppInfo) ->
    %% the returned files to build essentially specify a schedule and priority with special
    %% option sets
     %% Files that _must_ be built first like those in parse transforms, with
     %% different build options
    {{["/top/priority/files.erl"], CompilerOpts},
     %% {Sequential, Parallel} build order for regular files, with shared
     %% compiler options
     {{["/path/to/file.erl", ...], ["other/files/mod.erl", ...]}, CompilerOpts}}.

We do the same thing in rebar3_compiler_erl, and consider a broader set of changes. For example, we compare the timestamps of the build artifact to the source file that created it, but also the compiler options that were used for the build artifact with the current one (a file that hasn't been changed but whose compiler options have been modified needs to be recompiled as well).

I've been toying with a simpler sample app where I rewrite my blog engine to have the template files' changes tracked through the DAG, I'm hoping to have it ready some time this week.

leostera commented 4 years ago

@ferd does this mean that if A depends on B, then the needed_files/4 call for B has to figure out that A depends on it and should be rebuilt after B? 🤔

or is it in the needed_files for A that we state that B is a dependency that needs to be rebuilt before A?

seanhinde commented 4 years ago

I had in mind that the needed_files/4 call was only to do with static configured "Compile these types of files first" files. I hadn't figured that dependencies/3 creates a DAG that then needs to be manually unravelled as a final step in needed_files/4 to create the compilation order.

Maybe easiest for us to just place the output of caramelc sort-deps directly in the return from needed_files/4

ferd commented 4 years ago

Since you have a graph, and that the erlang digraph module supports these functions we only do something like:

needed_files(G, FoundFiles, OutMappings, AppInfo) ->
  NeededErlFiles = find_modules_worth_rebuilding(FoundFiles, ...), % using stamps, artifacts, compiler opts; ignore headers
  SubGraph = digraph_utils:subgraph(G, NeededErlFiles), % get all the files and deps of those needing a rebuild
  DepErlsOrdered = digraph_utils:topsort(SubGraph),  % give us a topological sort of it all

By then DepsErlsOrdered contains all files that need rebuilding because they have changed, in the order they need to be compiled. This gives us a safe sequential compiling order. You can stop there if you want.

To parallelize it, if safe (we can for erl files, but I can't for my blog because it relies on some stateful things):

{Sequential, Parallel} = lists:partition(
        fun(Erl) -> lists:any(
            fun(Edge) ->
                {_E, _V1, _V2, Kind} = digraph:edge(Graph, Edge),
                Kind =/= artifact
            end, digraph:in_edges(Graph, Erl)) end,
        lists:reverse(DepsErlsOrdered)
    ),

This splits whether files are depended on or not. Those that are depended on are sequentially built, those that aren't are kept parallel. In some cases you have other factors that may be at play, so we can't schedule the parallelism ourselves.

seanhinde commented 4 years ago

Super, thanks. I will have a go along those lines

ferd commented 4 years ago

Yeah it sounds like a pain in the butt, until you just decide to use the digraph module's algorithms. It pretty much contains all we need aside from the annoying partitioning stuff.

seanhinde commented 4 years ago

Haha, right. I always put digraph in the same mental bucket as sofs. Too much brain power needed unless really needed :)

seanhinde commented 4 years ago

and BTW updated docs are much clearer. I hadn't refreshed my browser tab for a few days so was still on the old doc

seanhinde commented 4 years ago

Now looking good using the code to pull out the ordered module list directly from the digraph as suggested. Thank you!

To build the correct digraph my dependencies are upside down from where I first had them. So now if B uses code in A I return [B] in the deps for A. The compile order I need is A then B. Do I have this correct?

I think I also need to look at what updates the stored digraph. Adding a new file to my project didn't seem to be enough. I will dig some more on that.

(BTW, off topic - my 15 year old son is learning erlang from learnyousomeerlang with the goal of building a minecraft server. He loves it. Thank you!!)

ferd commented 4 years ago

if B uses A, dependencies(B, ...) -> [A]. Basically, when analyzing the code of B, you should be able to find the imports and things that A provides. So the callback returns "B has dependencies on A". Within the DAG, the edge in the graph is also of the form "has a dependency on" (B -> A).

Glad to hear LYSE is still seeing usage :). I'm in talks to see if we're gonna do a second edition at some point.

seanhinde commented 3 years ago

Coming back to this. If I implement dependencies(B, ...) -> [A]. and dependencies(A, ...) -> []. topsort returns the order [B, A] and rebar3 tries to compile B first.

My dependencies function is:

dependencies(File, _Dir, SrcDirs, State) ->
    SrcFiles = lists:append([src_files(SrcDir) || SrcDir <- SrcDirs]),
    Cmd = "caramelc sort-deps " ++ lists:join(" ", SrcFiles),
    rebar_api:console("Cmd: ~s", [Cmd]),
    {ok, Res} = rebar_utils:sh(Cmd, [abort_on_error]),
    SortedDeps = string:tokens(Res, " \r\n"),
    rebar_api:console("File: ~p", [File]),
    %% Deps = tl(lists:dropwhile(fun(D) -> D =/= File end, SortedDeps)),
    Deps = lists:takewhile(fun(D) -> D =/= File end, SortedDeps),
    rebar_api:console("Deps: ~p", [Deps]),
    Deps.

and needed_files:

needed_files(G, FoundFiles, Mappings, AppInfo) ->
    FirstFiles = [],

    %% Remove first files from found files
    NeededErlFiles = [Source || Source <- FoundFiles,
                           not lists:member(Source, FirstFiles),
                           rebar_compiler:needs_compile(Source, ".erl", Mappings)],

    Opts = rebar_opts:get(rebar_app_info:opts(AppInfo), caramel_opts, []),
    Opts1 = update_opts(Opts, AppInfo),

    SubGraph = digraph_utils:subgraph(G, NeededErlFiles), % get all the files and deps of those needing a rebuild
    DepErlsOrdered = digraph_utils:topsort(SubGraph),

    rebar_api:console("Needed: ~p", [NeededErlFiles]),
    rebar_api:console("DepErlsOrdered: ~p", [DepErlsOrdered]),

    {{FirstFiles, Opts1}, {DepErlsOrdered, Opts1}}.

Producing output:

~/Build/rebar3/rebar3 compile
===> Analyzing applications...
===> Compiling rebar3_caramel
===> Analyzing applications...
Needed: []
DepErlsOrdered: []
===> Compiling rebar3_caramel
===> Verifying dependencies...
===> Analyzing applications...
Cmd: caramelc sort-deps /home/sean/Projects/caradeps/src/caradeps_a.ml /home/sean/Projects/caradeps/src/caradeps_b.ml
Cmd: caramelc sort-deps /home/sean/Projects/caradeps/src/caradeps_a.ml /home/sean/Projects/caradeps/src/caradeps_b.ml
File: "/home/sean/Projects/caradeps/src/caradeps_b.ml"
Deps: ["/home/sean/Projects/caradeps/src/caradeps_a.ml"]
File: "/home/sean/Projects/caradeps/src/caradeps_a.ml"
Deps: []
Needed: ["/home/sean/Projects/caradeps/src/caradeps_b.ml",
         "/home/sean/Projects/caradeps/src/caradeps_a.ml"]
DepErlsOrdered: ["/home/sean/Projects/caradeps/src/caradeps_b.ml",
                 "/home/sean/Projects/caradeps/src/caradeps_a.ml"]
Compiling: "/home/sean/Projects/caradeps/src/caradeps_b.ml"
===> sh(/home/sean/.opam/default/bin/caramelc compile /home/sean/Projects/caradeps/src/caradeps_b.ml)
failed with return code 1 and the following output:
Unbound module Caradeps_a

For the time being I have it working by reversing the logic (with the commented out line in dependencies/4) but I'd like to get to the bottom of this. Thoughts? Suggestions?

ferd commented 3 years ago

I'm guessing this line in how we partition things: lists:reverse(DepsErlsOrdered)

is what makes it work. The topological sort has to be reversed I guess but ought to be safe regardless.

seanhinde commented 3 years ago

Thanks for the confirmation. Will run with lists:reverse/1

seanhinde commented 3 years ago

Another topic, and then I think I have it done - I find I have a use case for access to some of the global state in my compiler module.

The purpose is to find the path to where the plugin is installed. That will allow me to locate the caramelc compiler executable.

One solution could be to hand the rebar_state:t() to the context callback which could put anything needed into some local state accessible from the other callbacks - like your dependencies_opts, but passed to all callbacks.

Or maybe pass the AppInfo of the plugin itself.

I have a workaround for now, but it feels like some more sacrifice of purity in the callback API might help in a few ways.

ferd commented 3 years ago

If your plugin has the data in priv_dir, you can call code:priv_dir(PluginAppName) and you'll get it right away. This would usually be the way I'd track this. That being said yeah, the plugin callbacks are a tad restrictive. I'm not too sure what would be the safe way of extending them without breaking anything.