CakeML / cakeml

CakeML: A Verified Implementation of ML
https://cakeml.org
Other
950 stars 84 forks source link

Exponential processing time increase with case statement nested parentheses #1075

Open Durbatuluk1701 opened 4 days ago

Durbatuluk1701 commented 4 days ago

Throughout I am utilizing CakeML v2274.

Given two programs, one with and one without nested parentheses in case statements:

fun fib n = 
case (n = 1) of
    True => 1
| False =>
case (n = 2) of
    True => 2
| False =>
case (n = 3) of
    True => 3
| False =>
case (n = 4) of
    True => 5
| False =>
case (n = 5) of
    True => 8
| False =>
case (n = 6) of
    True => 13
| False =>
case (n = 7) of
    True => 21
| False =>
case (n = 8) of
    True => 34
| False =>
case (n = 9) of
    True => 55
| False =>fib (n - 1) + fib (n - 2)

vs.

fun fib n = 
case (n = 1) of
    True => 1
| False =>(
case (n = 2) of
    True => 2
| False =>(
case (n = 3) of
    True => 3
| False =>(
case (n = 4) of
    True => 5
| False =>(
case (n = 5) of
    True => 8
| False =>(
case (n = 6) of
    True => 13
| False =>(
case (n = 7) of
    True => 21
| False =>(
case (n = 8) of
    True => 34
| False =>(
case (n = 9) of
    True => 55
| False =>fib (n - 1) + fib (n - 2)
))))))))

The time it takes to run cake < fib.cml > /dev/null for the one WITH parentheses will take much longer than the one without.

In particular, I generated the following benchmark showing the exponential increase given the nesting depth:

Code used to generate benchmark ```python import math import time import subprocess import matplotlib.pyplot as plt def fib_spec(i): if i <= 1: return 1 else: return fib_spec(i - 1) + fib_spec(i - 2) def file_name(n, parens): return f"./test_dir/fib_{str('no_') if not parens else '_'}parens_{n}.cml" def create_file_with_n(n, parens): with open(file_name(n, parens), "w+") as fd: fd.write(f"fun fib n = \n") for i in range(1, n): fd.write(f"case (n = {i}) of\n\tTrue => {fib_spec(i)}\n| False =>") if i == n - 1: fd.write("fib (n - 1) + fib (n - 2)\n") close_parens = ")" * (n - 2) if parens: fd.write(close_parens) else: if parens: fd.write("(\n") else: fd.write("\n") def time_file_exec(n, parens): create_file_with_n(n, parens) print(f"Processing fib({n}) with parens={parens}") fd = open(file_name(n, parens), "r") start = time.time() subprocess.run(["cake"], stdin=fd, stdout=subprocess.DEVNULL) end = time.time() return math.floor((end - start) * 1000) min_iters = 5 max_iters = 20 x = [i for i in range(min_iters,max_iters)] parens = [time_file_exec(i, True) for i in range(min_iters,max_iters)] print(parens) no_parens = [time_file_exec(i, False) for i in range(min_iters,max_iters)] # Make an array for x vs. parens and x vs. no_parens plt.plot(x, parens, label="With Parens") plt.plot(x, no_parens, label="Without Parens") plt.xlabel('depth') plt.ylabel('Time (ms)') # make the scale logarithmic base 2 plt.yscale('log', base=2) # make y scale tick marks appear plt.yticks([math.pow(2, i) for i in range(8, 15)]) # save the plot to a file plt.savefig("fib_plot.png") ```

fib_plot

Another thing I will mention, I tried testing this with if statements and did not see the same exponential increase

Durbatuluk1701 commented 2 days ago

FYI: I have tested this with CakeML v2648 and it still seems to be an issue.

tanyongkiam commented 2 days ago

Thanks for the report and the code.

I looked into it and this looks like exponential behavior in the parser, so perhaps something @mn200 could comment on.

For example:

finished: start up            
finished: lexing and parsing   --- 24654 milliseconds
...
finished: start up            
finished: lexing and parsing   --- 47164 milliseconds
finished: type inference       --- 15 milliseconds
...

(the rest of the compiler seems to run normally after parsing)

Durbatuluk1701 commented 2 days ago

For example:

finished: start up            
finished: lexing and parsing   --- 24654 milliseconds
...

Would you mind mentioning how you got that debug output? Would likely prove helpful for me in other work using CakeML.

tanyongkiam commented 2 days ago

Sure, you will need to bootstrap the CakeML compiler with its debugging output turned on.

If you have the latest release files and cake built, you can run this:

./cake --sexp=true --emit_empty_ffi=true --skip_type_inference=true < cake-sexpr-64 > cake_debug.S

This will produce cake_debug.S which you can then turn into an executable with:

gcc -O2 cake_debug.S basis_ffi.c  -DEVAL -DDEBUG_FFI -o cake_debug -lm

This version of the compiler will print the debugging output when compiling files but note that --repl and related flags will NOT work.

mn200 commented 2 days ago

I will try to look into this!