Tail call optimization for partially applied functions and functions coming from a pattern match

pema99 commented 1 year ago

Description

Tail calls are not optimized in some cases where I hoped they would be, which can lead to stack overflows given large enough input.

Repro code

Consider this simple function. It get's optimized to a while-loop:

let rec foo x a =
    if x <= 0 then a
    else
        foo (x-1) (a+1)

However, this does not, as the function is partially applied:

let rec foo x a =
    if x <= 0 then a
    else
        let bb = foo (x-1)
        bb (a+1)

And this also doesn't, because we had to pattern match to get the function:

let rec foo x a =
    if x <= 0 then a
    else
        let (bb, _) = (foo, 1)
        bb (x-1) (a+1)

In general, it seems any kind of delayed tail call isn't optimized. I assume this is just something nobody has implemented yet, but I am curious if it is tracked, and if there are any plans to do so? FWIW, Microsoft's main F# compiler currently optimizes all 3 of these.

Expected and actual results

All 3 functions are tail recursive, and should be optimized into while-loops to not overflow the stack. One might also consider using a trampoline in these cases, if this is deemed too difficult. That tends to be easier to implement, but slower.

ncave commented 1 year ago

@pema99 It's possible that these cases are tail-call optimized by the .NET JIT, not by the F# compiler. Unfortunately there are no such optimizations in most JavaScript engines at the moment. There are JavaScript workarounds with varying performance implications (trampolines, etc.), but none are implemented out of the box in Fable (besides what the F# compiler already does). See also #2392.

It is, of course, also possible that the tail-call elimination analysis that Fable does is just not taking into account delayed functions at tail position. If you have a proposal or an idea for improving it, by all means share it, contributions are always welcome.

ncave commented 1 year ago

@pema99 Looks like you're correct, here is what the first one compiles to in .NET:

let rec foo x a =
    if x <= 0 then a
    else
        let bb = foo (x-1)
        bb (a+1)

Compiles to (C#):

public static int foo(int x, int a)
{
    while (x > 0)
    {
        int num = x - 1;
        a++;
        x = num;
    }
    return a;
}

So we have to figure out how the F# compiler codegen does that for delayed functions.

pema99 commented 1 year ago

I think one approach is to do a kind of abstract interpretation of the syntax tree for whatever intermediate representation is most relevant at this point (not familiar with Fable code base).

Just as type inference can be viewed as abstract interpretation where values are types, I think you can treat this problem as abstract interpretation where values are objects carrying information about whether a recursive call was found, and how many arguments have been applied to it so far. You'd keep adding arguments to the object as you find applications, and once these reach target arity, you eliminate the call if in tail position.

Another option is to let the user manually annotate functions they want trampolined, or something like that. Trampolines are very easy to implement in JS with downside being bad performance for small inputs.

I admit that first description is pretty vague, but it's probably how I would start trying to solve this.

pema99 commented 1 year ago

It just seems a little spoopy to me that you can take perfectly fine dotnet F# code and compile with fable, and then start seeing stack overflows from doing so :9

If I wanted to look at how fables current tail call analysis is done, do you know where I should start looking?

pema99 commented 1 year ago

Ah, just saw the issue you linked. Didn't realize Fable also doesn't optimize mutual tail recursion or CPS. I don't have much experience using Fable yet, so perhaps my expectations were just set a bit too high. Those seem more pressing, I suppose :9. Was primarily curious if the lack of handling for delayed tail recursion was something you were aware of at all.

ncave commented 1 year ago

@pema99 Yes, I believe spoopy is the correct technical term. Looks like the problem actually happens before the tail call position check, in that the beta reduction of bb is not happening, so the call is not in a tail position. I'll take a look.

pema99 commented 1 year ago

@ncave Oh, interesting. Does this fix both the failing cases I mentioned? If so, fix seems much simpler than what I was proposing - but then again, I really don't know what infrastructure is already present in the compiler :P

ncave commented 1 year ago

@pema99 Right now it only fixes the first, I'm looking at the second (tuple binding).

ncave commented 1 year ago

@pema99 I think I fixed both, looks like both were regressions that used to work before. Thanks for contributing this issue!

pema99 commented 1 year ago

Great success! :D

MangelMaxime commented 1 year ago

@ncave I opened an issue where the fixes for this issue is only applied when Fable is run in release mode.

See #3522

Is it possible to support it when in watch mode too? Otherwise, it means we have disparity in terms of feature depending which mode Fable is running in.

ncave commented 1 year ago

@MangelMaxime The default Fable CLI -c or --configuration option says that "default is 'Debug' in watch mode, or 'Release' otherwise", which means that yes, it will be different (as Debug builds are quite different than Release builds, even on .NET).

So perhaps the -c or --configuration option can be used together with --watch to force Release mode.

MangelMaxime commented 1 year ago

So perhaps the -c or --configuration option can be used together with --watch to force Release mode.

Using --watch --configuration Release does fix the problem with the tests.

But this means that some features are supported in Release and not in Debug mode. When in an ideal world it should work in both case.

Because, if I am a normal user, and have my code falling I will think of a bug and open an issue.

ncave commented 1 year ago

@MangelMaxime

some features are supported in Release and not in Debug mode

This is a F# compiler behavior, it produces different code in Debug vs Release mode. IMO it's reasonable to expect different code being generated when optimizations are turned off (i.e. in DEBUG mode). That's just how it works for most compilers and most languages.

I don't know why the default mode for --watch was set to DEBUG, there was probably a reason, but perhaps it can be set to Release.

MangelMaxime commented 1 year ago

IMO it's reasonable to expect different code being generated when optimizations are turned off (i.e. in DEBUG mode). That's just how it works for most compilers and most languages.

I understand that part :) But in most compilers, I think both compile valid code.

I don't know why the default mode for --watch was set to DEBUG, there was probably a reason, but perhaps it can be set to Release.

Historically, it has been set to DEBUG so library like Elmish.HMR which needs a different code generated between dev and release mode could rely on the DEBUG flag. Today, I would preferred to have a FABLE_WATCH flag to avoid situation where we rely on an external flag but I think the breaking changes is too risky as it could break a lot of libraries / programs that rely on the current behaviour.

fable-compiler / Fable