felix-lang / felix

The Felix Programming Language
Other
806 stars 44 forks source link

Compiler infinite loop bug #88

Closed gbluma closed 4 years ago

gbluma commented 8 years ago

I noticed this issue when trying to write an implementation of the negative binomial function. Thought I'd post it for public awareness.

I can narrow down the example and look into fixing it, just trying to not lose my train of thought.

fun fact : int -> int =
  | 1 => 1
  | n => n * (n - 1).fact
;

// probability mass function for negative binomial dist.
fun nbinom (k:int, r:int, p:int) : int =>
  let coeff_numer = (k + r - 1).fact in
  let coeff_denom = k.fact * (r - 1).fact in
  let coeff = coeff_numer / coeff_denom in
  let coeff = p ^ k in
  coeff * (1 - p) ^ r
;

produces the following compiler output (and it never finishes)

[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError2 Whilst calculating return type:
Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError2 Whilst calculating return type:
Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError Whilst calculating return type:
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of int^2
ClientError2 Whilst calculating return type:
Whilst calculating return type:

The problem goes away if the code is refactored to not use let expressions.

gbluma commented 8 years ago

It's easy to bypass this issue by implementing pow

fun pow : int * int -> int = "(int) $1^$2";

but the bug would probably show its face somewhere else at some point.

skaller commented 8 years ago

I can guess what is going on. let x = X in Y is translated to match X with | x => Y, and Y will be translated to y () where fun y () { return Y[x=>X]; }, that is, Y with x replaced by X. The reason for y is to ensure local variables of a match branch don't escape the branch. The point is there is a function y now, without a declared return type, so the return type has to be deduced.

Now, the return type is deduced by a nasty trick. We find all the return statements in the function, and bind them if possible, then unify the results with the declared return type if any. The code is set up to throw an exception if there is a circular dependence created by a recursive function call. In that case, there should always be a branch that is not recursive, and the return type can be found from that one.

After the return type is found it is cached. At some stage the whole function is then bound. This generally involves repeating all the previously performed calculations, except that now we know the return type. However the SAME piece of code is used in both the tentative binding process and the final binding process. The point is that the tentative binding of an expression is allowed to fail without triggering a compiler error, whereas the final binding is not allowed to fail.

Implementing this is very tricky so take care trying to repair the bug. At some stage I tried to fix it by throwing and catching particular exceptions like OverloadResolutionError which are not errors during tentative binding. The problem is that the tentative binding itself can be recursive, and that recursion has to occur inside the exception handling. If it goes outside the exception handling you may get an infinite loop. However the final binding has to catch the exception and report an error.

Unfortunately, the code also uses try/catch sequences chained together to try out various extensions. If it fails, we just try the next one. The reason is purely a coding trick: try/catch in Ocaml can be composed linearly, matches are nested, which means local insertion of new extensions is almost impossible with matches, since a non-local change right at the bottom of the top level match is needed to terminate exactly the right number of match expressions. This is intractable, so I used try/catch instead.

There is a thing called rs in the code which is initially rsground. This is the recursion tracker, it keeps a list of previous arguments to expression/type evaluators. These evaluators are separate entry points into a huge recursive let rec swamp. The mess is caused by things like typeof(expr) which means the type calculator also has to be able to evaluate expressions. When a loop is detected it is either an error OR it may just mean a properly recursive structure, such as a recursive type, has been found. So some otherwise infinite recursions are valid (eg for types) and others (eg for expressions) are not.

skaller commented 4 years ago

This no longer infinite loops. I have no idea what changed but it isn't a bug now. Here is the output:

Flx_lookup:inner_bind_expression: Client Error binding expression (p ^ k)
Error at:
/Users/skaller/felix/bin.flx: line 7 col 1 to  line 13 col 2
 6: // probability mass function for negative binomial dist.
 7: fun nbinom (k:int, r:int, p:int) : int =>
 8:   let coeff_numer = (k + r - 1).fact in
 9:   let coeff_denom = k.fact * (r - 1).fact in
10:   let coeff = coeff_numer / coeff_denom in
11:   let coeff = p ^ k in
12:   coeff * (1 - p) ^ r
13: ;
14:

[Flx_lookup.bind_expression] Inner bind expression failed binding (p ^ k)
CLIENT ERROR
[flx_bind/flx_lookup.ml:2973: E149] [lookup_name_with_sig] Can't find pow[] of BTYP_array(BTYP_inst(102[]:TYPE),2)=
pow[] of int^2
In /Users/skaller/felix/bin.flx: line 7 col 1 to  line 13 col 2
 6: // probability mass function for negative binomial dist.
 7: fun nbinom (k:int, r:int, p:int) : int =>
 8:   let coeff_numer = (k + r - 1).fact in
 9:   let coeff_denom = k.fact * (r - 1).fact in
10:   let coeff = coeff_numer / coeff_denom in
11:   let coeff = p ^ k in
12:   coeff * (1 - p) ^ r
13: ;
14:

Felix compilation "/Users/skaller/felix/build/release/host/bin/flxg" "-q" "--inline=25" "--output_dir=/Users/skaller/.felix/cache/text" "--cache_dir=/Users/skaller/.felix/cache/binary" "-I/Users/skaller/felix/build/release/share/lib" "-I/Users/skaller/felix/build/release/host/lib" "--syntax=@/Users/skaller/felix/build/release/share/lib/grammar/grammar.files" "--automaton=/Users/skaller/.felix/cache/binary/Users/skaller/felix/build/release/share/lib/grammar/grammar.files/syntax.automaton" "--import=plat/flx.flxh" "--import=concordance/concordance.flxh" "std" "/Users/skaller/felix/bin.flx" failed
Error 1 in flx: Operation not permitted