Variables emitted in wrong order leading to runtime crash

markhamburg commented 7 years ago

I've now seen something like this a few times. Sadly efforts at a minimalist SSCCE haven't produced results displaying the bug and I don't know enough about the structure of the Elm compiler to guess what piece of logic I need to tickle.

This Elm code

operationHandlers : List (Operation.Op Msg -> Maybe Updater)
operationHandlers =
    [ handleAccountOperation
    ]

handleAccountOperation : Operation.Op Msg -> Maybe Updater
handleAccountOperation op =
    case Debug.log "handleAccountOperation" op of
        Operation.Account accountID accountOp ->
            applyAccountOperation accountOp
                |> Maybe.map (updateAccount accountID)
        _ ->
            Nothing

produces this JavaScript

var _adobe$adlproj$App_SessionData$operationHandlers = {
    ctor: '::',
    _0: _adobe$adlproj$App_SessionData$handleAccountOperation,
    _1: {ctor: '[]'}
};
var _adobe$adlproj$App_SessionData$handleAccountOperation = function (op) {
    var _p5 = A2(_elm_lang$core$Debug$log, 'handleAccountOperation', op);
    if (_p5.ctor === 'Account') {
        return A2(
            _elm_lang$core$Maybe$map,
            _adobe$adlproj$App_SessionData$updateAccount(_p5._0),
            _adobe$adlproj$App_SessionData$applyAccountOperation(_p5._1));
    } else {
        return _elm_lang$core$Maybe$Nothing;
    }
};

Observe that when we construct the first variable, we reference the second variable before it has been initialized. This loads the list up with an undefined which goes boom when we try to call it. There is no other declaration for the second variable so this is not a shadowing problem. Changing the declaration order does not fix this.

I've seen similar problems at times with the values of uninitialized variables getting captured in module level function defined by currying.

process-bot commented 7 years ago

Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

pdamoc commented 7 years ago

SSCCE:

module Main exposing (..)

import Html exposing (..)

funcs =
    [ first ]

first x =
    if x == "a" then
        x
    else
        List.head funcs
            |> Maybe.map (\f -> f "a")
            |> Maybe.withDefault "b"

main =
    text <| first "b"

for this particular SSCCE, placing the emitted JS definition of funcs after the emitted JS for first solves the problem.

jvoigtlaender commented 7 years ago

@pdamoc, your SSCCE is an instance of several issues already listed in https://github.com/elm-lang/elm-compiler/issues/1377, under the heading "Mutually Recursive Definitions".

@markhamburg, is there any chance that in your case some mutual recursion is involved as well (e.g., handleAccountOperation depending on operationHandlers via applyAccountOperation or updateAccount)? In that case, yours would be another instances of what is already recorded in https://github.com/elm-lang/elm-compiler/issues/1377. Otherwise, you might have found a genuinely new thing, that happens even when no mutual recursion is involved.

markhamburg commented 7 years ago

Thanks @pdamoc for adding a potential SSCCE. It may be related or it may just be a case of the mutually recursive definitions bugs. (I thought those were addressed in Elm 0.18?)

In my case, I don't think there is any actual recursion though I will check it over once again.

Without knowing anything about the actual structure of the compiler, if I were to hazard a guess, I would say that it is omitting some of the edges needed in the topological sort of definitions. This would explain why the bug seems sporadic since sometimes the sort will work out correctly and sometimes the code complexity will be just right to make it work out wrong.

A naive fix for this particular case would be to emit all function definitions — not partial function applications but actual javaScriptVariable = function( ... definitions before anything else. Since those will just capture variable references rather than variable values they ought to be fine — assuming scoping is all handled correctly as well. That fix would probably be insufficient for some other cases.

jvoigtlaender commented 7 years ago

I thought those were addressed in Elm 0.18?

No, the fix you have in mind (addressing https://github.com/elm-lang/elm-compiler/issues/873) was about something else. That fix made it so that if there is recursion without an intervening lambda (as in let { x = y; y = x + 1 } in ...) then the compiler now rejects the program. In cases where there is an intervening lambda, as in your case handleAccountOperation = \op -> ... and in all other cases that have been brought up in this context after "the #873 fix" (including @pdamoc's SSCCE above), the Elm 0.18 compiler does not reject, and does not always do the semantically correct thing.

(Another way to put it, I think it is correct to state: The #873 fix only made the compiler accept fewer programs. It did not change the behavior of programs that are still accepted. As is the case of your program.)

markhamburg commented 7 years ago

A deeper inspection of my module does make this look to be another case of mutually recursive definitions. It's a long cycle that passes through a lot of functions but the resulting Updater function (Model -> ( Model, List OutMsg )) can eventually get back around to a case where we have more operations to translate and hence come back in through the code that resulted in the original call to handleAccountOperation. If the changes in Elm 0.18 merely rejected more case but did not fix the cases it did accept then that would explain the problem.

markhamburg commented 7 years ago

While I would like to see a fix for the general problem (which just thinking about has me realizing is non-trivial), it would be really great to get a naive "fix" sooner rather than later of simply sorting all JavaScript function definitions ahead of any other definitions in the resulting code.

pdamoc commented 7 years ago

@markhamburg If you just want a hack that can solve your issue, turning operationHandlers into a function (i.e. a function that ignores its arguments), you will have the issue fixed without any need to edit the outputted code.

markhamburg commented 7 years ago

@pdamoc — There are a variety of ways to hack around the issue. The pain point is that to discover that one has an issue, one has to experience and then diagnose the runtime error (thereby failing Elm selling point #2 on elm-lang.org).

Anyway, I did some more thinking about the general problem. I haven't yet dug into the Haskell code for the compiler to try to track it down and it isn't clear to me whether such digging is welcomed. But looking at this theoretically, we can discern both that the rules in Elm 0.18 are probably insufficient and that the implementation of sufficient rules is actually not that hard (depending on what the compiler structures look like) and might even come with some benefits in identifying dead code.

The analysis. We have a set of definitions in a module or a let statement to process. Some of those definitions refer to other definitions. This forms a directed graph with definitions as nodes and references as edges from the referring node to the referred node. We can condense the strong components in this graph (Harary, Graph Theory, Chapter 16 Digraphs) leaving us with a DAG of strong components. DAG's are straightforward to traverse in an order that ensures we emit code for referenced components before the referring components. So, we only need to concern ourselves with what happens when processing a strong component. If the component consists of a single definition, we emit the code for that definition and we're done. We can even handle self-referential functions because we will actually emit the variable to hold the function and then fill in that variable with code that is free to reference that variable. If the component consists of multiple definitions, then because this is a strong component, all of those definitions are dependent on one another. If all of those definitions are functions, we're fine because the functions won't actually be executed during the process — again, we essentially emit the variables and then fill them in with functions that reference the variables. If, on the other hand, there is an expression that is not just a function definition then because the component is strongly connected, there is a dependency path that leads back to the definition and hence to evaluate the expression we would have had to already evaluated the expression. So, the conservative answer is to say not just that cycles need to contain a function somewhere as Elm 0.18 does but that cycles can only contain functions.

So, find the strong components — linear time algorithms exist for this — and then do a basic demand-based recursive pull on the condensed graph of strong components to emit the code for all values of interest. (A side effect of this latter process is that one ends up with a set of definitions that were never emitted because their values weren't needed. Dead code detection!)

This approach is, however, harsh in what it allows. As noted, it is more restrictive than Elm 0.18. There are, however, things we could do to mitigate this and some things that don't actually help all that much.

An example of the latter is that one might think that going to a finer grained graph with nodes for every function definition and function application within expressions would allow for better segmentation of dependencies. But if we have a cycle involving a value, coming up with a bigger underlying graph that still leads to the same reference points will still leave that definition in a cycle.

Where we really can make progress is in the example that started this report. Since as it turned out, I had a cycle that led from handleAccountOperation back to operationHandlers the code above would not pass the conservative rule I presented above. But this seems frustrating because the definition of operationHandlers does not need to execute the functions. It merely needs them to exist. That's more than we need to emit the code for the functions but less than we need for general expression evaluation. Essentially, we have a category of expressions that do not evaluate their arguments: constructors, record constructions (but not updates), function composition, currying when partial evaluation just means "save the arguments for use when we receive more". If a strong component consists only of function definitions and values defined via "construction expressions" and if the reference graph amongst those constructor expressions forms a DAG, then we can emit the code for all of the function definitions and then use recursive DAG traversal to emit the code for all of the constructor expressions (or detect that they don't form a DAG). The effectiveness of this technique in increasing the number of legal programs depends on how generous we are able to be in declaring things to be constructor expressions. But if we just recognized the expressions that could be built with the standard library plus perhaps constructors from within the module being compiled, this would probably cover the vast majority of cases.

We can actually go further and handle cyclic constructor expression references if we separate allocation from initialization since we can allocate everything and then fill it in. Whether this is useful in practice is a different matter and it makes the initialization code potentially rather different from the evaluation code. (That said, I could imagine other implementations where being able to do a big bump allocation for a bunch of mutually referring values would be useful.)

So, in summary:

Classify definitions as function definitions, constructor expressions, and general expressions
Build a directed graph of references between definitions
Partition this graph into a DAG of strong components
Any general expression node in a strong component with other nodes or which refers to itself is a fatal cycle
Use the exported values (module) or final value (let) to drive a recursive process emitting code in post-order for the strong components — and detecting unused strong components as a side effect
When emitting code for a strong component consisting of a single general expression, we can just emit the code
When emitting code for a strong component consisting of function definitions and constructor expressions, emit the code for the function definitions and then use a recursive DAG process to emit the code for each of the constructor expressions treating a cycle detected during this process as a fatal error

markhamburg commented 7 years ago

Furthermore, if one uses Tarjan's algorithm (https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) one can avoid finding strong components for dead code and do the strong component output as components are identified since the algorithm identifies the components in reverse topological order.

markhamburg commented 7 years ago

One other thing that can be done to increase the amount of code that doesn't get rejected for cyclic dependencies: If a definition has function type but it is defined using a general expression, then it can be converted into a function definition during compilation at the expense of potentially doing more evaluation at runtime. As a trivial example without cycles:

addOne : Int -> Int
addOne = (+) 1

is equivalent to:

addOne : Int -> Int
addOne x = 1 + x

The above is just an example of currying which I called out before but one could imagine cases which would require more code that wouldn't just be matters of currying. This would introduce the notion of definitions that "could be emitted as functions".

markhamburg commented 7 years ago

I tried digging into the actual Elm compiler and without a roadmap wasn't sure what to touch, but here is code that does the sequencing and cycle detection using Tarjan's algorithm written in Elm:

https://ellie-app.com/Rq6CsBywz8a1/0

You can play with the test data at the end to do things like get it to report cycles. The nodes dictionary contains a mapping from node index to a pair containing the node kind (see above) and the sources for the node. The output consists of a list of nodes in the appropriate order with flags indicating whether or not the nodes should be emitted as functions (for the cases where that would be an option).

Anything not emitted in the sequence list but in the input nodes dictionary is dead code.

evancz commented 7 years ago

I tried to write up the general problem in https://github.com/elm-lang/elm-compiler/issues/1591

It appears @markhamburg has an idea for detecting a larger subset bad cases, but it is very difficult to assess when it is spread out across so many long comments. Can it be distilled down into simple bullet points that outline the actual algorithm? Can that summary then be shared here? I can't work with tons of prose. With minimal bullet points, I'll either be able to figure out the rest or I'll know what question to ask.

In the meantime, I will add a link from the meta issue to this one.

elm / compiler

Variables emitted in wrong order leading to runtime crash #1580