fable-compiler / Fable

F# to JavaScript, TypeScript, Python, Rust and Dart Compiler
http://fable.io/
MIT License
2.93k stars 301 forks source link

Some functions not being tail-call optimized #1368

Closed stkb closed 6 years ago

stkb commented 6 years ago

Hi, I'm not sure what the current status of TCO is in Fable and if there are known limitations, but I noticed one of my functions not being optimized, so I did some experimenting in the REPL.

My actual function had the form of fun2 here. I can change it to fun1 instead as a workaround but maybe the case is fixable.

// Is optimized
let fun1 a b =
    let rec loop n =
        if n = 0 then "done"
        else loop (n - 1)

    loop b

// Not optimized
let fun2 a =
    let rec loop n =
        if n = 0 then "done"
        else loop (n - 1)

    fun b -> loop b

// Is optimized
let fun3 =
    let rec loop n =
        if n = 0 then "done"
        else loop (n - 1)

    fun a -> loop a

// Not optimized
let fun4 =
    let rec loop n =
        if n = 0 then "done"
        else loop (n - 1)

    fun a -> loop 1

REPL link

voronoipotato commented 6 years ago

Are these cases optimized in standard?

stkb commented 6 years ago

Sorry, what do you mean by "standard"?

alfonsogarciacaro commented 6 years ago

Thanks for checking this and providing the samples @stkb! Unfortunately this is due to the fact that Fable 1 is not handling very well module functions with less arguments than the signature as they may interfere with the uncurrying optimization, so Fable is adding runtime checks (the CurriedLambda you see in the generated code) which prevent the TCO.

It's difficult to solve this for Fable 1, but Fable 2 (dev2.0 branch) will handle uncurrying much better (hopefully). I just tested the code you posted and the inner loop function was TCOed in the four cases. So if you don't mind, we can wait until Fable 2 alpha is released and try again.

Maybe @voronoipotato meant F# on .NET? I guess these cases are indeed optimized, but the only way to be sure (that I'm aware of) is to compile the code in Release mode and then use a disassembler to check the generated IL.

stkb commented 6 years ago

@alfonsogarciacaro thanks! great that it's already working in v2 👍 I can work around it until then.

Maybe voronoipotato meant F# on .NET?

I did wonder that, and while I didn't go as far as looking at IL, I tried them in the F# repl with very large numbers and they all ran fine without blowing the stack.

Not sure if this should be closed or left open so I'll let you close it if you want.

alfonsogarciacaro commented 6 years ago

Perfect, thanks! Let's keep it open so I remember I have to make a Fable 2 playground available as soon as possible :)

voronoipotato commented 6 years ago

Yes sorry that's what I meant. I was tired lol.

alfonsogarciacaro commented 6 years ago

This is working with Fable 2 REPL so let's close it. Please feel free to reopen if you find a similar issue :+1:

GordonBGood commented 6 years ago

@alfonsogarciacaro, I don't think you can close this yet, as it looks like TCO is still a bit of a mess...

Consider this Fable 2 REPL, which includes two problems, as follows:

  1. The state of the internally mutated tail called function parameters aren't being preserved properly, which is made plain when one defers execution in passing these to a closure as in test0(). The results of the list are what is found before pressing the Start... button, and should be ["1";"2";"3"]but are["0";"0";"0"]` instead.
  2. TCO is not working for the complex "loops" used for the Sieve of Eratosthenes culling even though these are completely candidates for TCO, and the test works over ten times faster when the TCO functions are converted to loops with while...do.

The first should be easy to correct, as in converting the following generated JS code:

export function test0() {
  const loop = function loop(i, lst) {
    loop: while (true) {
      if (i <= 0) {
        return lst;
      } else {
        const $var$$3 = i - 1;
        lst = L(function () {
          return i;
        }, lst);
        i = $var$$3;
        continue loop;
      }
    }
  };

to the following, changing the sequence of preserved immutable values:

export function test0() {
  const loop = function loop(i, lst) {
    loop: while (true) {
      if (i <= 0) {
        return lst;
      } else {
        const i$$1 = i | 0;
        lst = L(function () {
          return i$$1;
        }, lst);
        i = i$$1 - 1;
        continue loop;
      }
    }
  };

Elm currently has this same problem, and although it can be fixed just as easily there, it will cost some execution time if Evan continues to insist on only generating ES3 JS code (no use of let or const) and uses a function invocation to produce a new (function scoped) var binding. That's partly why I've come back to further investigations of Fable even though I like the simplicity of Elm - stubbornness of the BDFL!

The second problem's origin is unknown to me, as I haven't dug into how the compiler determines when TCO can be applied, but something is wrong as all of the internal functions can use TCO and none of them do.

alfonsogarciacaro commented 6 years ago

Thanks for the detailed report @GordonBGood, this helps a lot! I have fixed the first issue following your kind instructions. I'll try to investigate the second one, I guess we're not supporting nested tail calls atm. Tail-call was the only optimization that wasn't moved to the FableTransforms pass during the Fable 2 refactoring (partly because we don't have the return info in the Fable AST and also there's no labeled statement). It remains in the Fable2Babel pass and it's still a bit messy, but hopefully this is fixable :)

GordonBGood commented 6 years ago

@GordonBGood, I have [fixed the first issue]. I'll try to investigate the second one, I guess we're not supporting nested tail calls atm. Tail-call was the only optimization that wasn't moved to the FableTransforms pass during the Fable 2 refactoring (partly because we don't have the return info in the Fable AST and also there's no labeled statement). It remains in the Fable2Babel pass and it's still a bit messy, but hopefully this is fixable :)

@alfonsogarciacaro, you're one of the best BDFL people for support I have ever seen - this is the second issue I have reported that you have taken effective action within a day of my report I hope you are able to solve the problem of producing reliable TCO's for the nested functions, as what use is a functional language without them? At least in Fable we have the alternative of writing the loops ourselves using while..do statements, but while very performant that is ugly code.

Your reaction to important issues is a refreshing difference from over at Elm where it may take a year for things to get fixed, and many issues are "fixed" by taking features away or by giving away performance or the language's capability as a general purpose language. Of course, here you have less choice in the matter as you are trying to maintain the standard of the F# language even though you have eased up on some safety issues such as array bounds checking in the interests of performance - I can live with that, but Elm's Evan never would, as he strives for "no runtime failures" even though that is impossible - there are at least ten ways of crashing an Elm application that are either not detectable in advance, or if detectable in the runtime, still not preventable by the compiler.

I had a quick scan of your "Fable2Babel.fs" file to try to get an idea of the magnitude of the problem, but am not enough invested in fully analyzing it to be able to see your problem: No function return types in the Fable AST you say - that could be a serious problem, is it possible to fix it? It doesn't seem to me that labelled statements would be an absolute necessity for this? What's different about nested functions that TCO doesn't work when it does for outer ones?

This work isn't lost when you eventually move to producing WebAssembly code, as TCO will be just as necessary there. I have some thoughts on how WebAssembly development might progress and will put some of that in the Issue discussing that.

The short version of that is that a full Sieve of Eratosthenes implementation in Fable (the short test version of it used here to find the lack of TCO) will beat the Sieve of Atkin "primegen" to any range such as a billion or more running on JavaScript with Chrome V8 version 70 on my machine, and when the same code is taken and converted to WebAssembly using "WebAssembly Studio/beta",, it runs up to about two and a half times faster on Microsoft Edge version 42, just a little bit faster on the latest Firefox, and slower using WebAssembly on Chrome than it runs with JavaScript. The point is that WebAssembly support exists on all the major browsers but is in a state of flux right now as they work at trying to get it to load and instantiate faster, while having given up various degrees of performance in order to do so - that will change over time and likely greatly improve over the next year or two for all major browsers as well as having a lot of planned features added, like GC or other memory management, threading, etc.

alfonsogarciacaro commented 6 years ago

Thanks a lot for the kind words @GordonBGood. I prefer not to compare with other projects, I just think I have it easier because I don't need to worry about language design which is where most of the controversies arise, and even most of the JS generation can be delegated to Babel :) And when a user submits a report with code to reproduce and potential solutions, etc, this makes it much easier to handle the issue and I also try to answer quickly to show my appreciation for the time spent investigating the problem ...and so I don't forget 😅

About the nested tail calls, sorry I meant in F# and Fable ASTs it's not easy to know where the return statement should go, as we're not keeping the return strategy (returning, assigning to a variable) in these steps at the moment. I just had another look at this and found out my intuition was wrong, the problem was I wasn't tail-call optimizing functions returning unit. I did it here and it seems to work for your sample, but I'd like to add more tests because in these case it's necessary to be careful as so far we've been omitting empty return statements but this can make you fall in an infinite loop if the function is tail-call optimized.

ncave commented 6 years ago

@alfonsogarciacaro Sounds awesome, hopefully you can make it work. But you're right about the infinite loop, some tests are broken, e.g. "Map.iter works".

Here is what Map.iter looks like:

    let rec iter f m =
        match m with
        | MapEmpty -> ()
        | MapOne(k2,v2) -> f k2 v2
        | MapNode(k2,v2,l,r,_) -> iter f l; f k2 v2; iter f r

which is compiled as:

function MapTreeModule$$$iter($arg$$43, $arg$$44) {
  MapTreeModule$$$iter: while (true) {
    const f$$6 = $arg$$43,
          m$$9 = $arg$$44;

    switch (m$$9.tag) {
      case 1:
        {
          const v2$$8 = m$$9.fields[1];
          const k2$$12 = m$$9.fields[0];
          f$$6(k2$$12, v2$$8);
          break;
        }

      case 2:
        {
          const v2$$9 = m$$9.fields[1];
          const r$$10 = m$$9.fields[3];
          const l$$10 = m$$9.fields[2];
          const k2$$13 = m$$9.fields[0];
          MapTreeModule$$$iter(f$$6, l$$10);
          f$$6(k2$$13, v2$$9);
          $arg$$43 = f$$6;
          $arg$$44 = r$$10;
          continue MapTreeModule$$$iter;
          break;
        }

      default:
        {}
    }
  }
}

There is no break out of the loop.

GordonBGood commented 6 years ago

I prefer not to compare with other projects...

@alfonsogarciacaro, I only brought up Elm because, in contrast to contributing to here on Fable, it is hard to make much progress there even though I have contributed several similar issues as here. Also, as you know, it is often good to compare language implementations as there is often something to learn that can be applied. BTW, your adaptation of "The Elm Architecture" in your "Elmish" package is brilliant and is about as easy to use as in Elm, which is partly why I am back to Fable. Also, with Fable2 it's good to see you and your contributors are on the drive to performance, which is another reason I am back.

I just had another look at this and found out my intuition was wrong, the problem was I wasn't tail-call optimizing functions returning unit. I did it here and it seems to work for your sample, but I'd like to add more tests because in these case it's necessary to be careful as so far we've been omitting empty return statements but this can make you fall in an infinite loop if the function is tail-call optimized.

I see that @ncave has found that you were correct and you aren't terminating some TCO loops returning Unit correctly; it seems quite simple to fix: just make sure all switch cases's including the default case (which currently isn't terminated at all) are terminated with a return instead of a break when inside a TCO'd loop where some case's are looped (continue'd) back to the tail called function, where any return; calls after a continue would just be dead code and likely eliminated by the packing process (just as the dead code break statements presumably were previously)? The generated code for the Map.iter would then be as follows:

function MapTreeModule$$$iter($arg$$43, $arg$$44) {
  MapTreeModule$$$iter: while (true) {
    const f$$6 = $arg$$43,
          m$$9 = $arg$$44;

    switch (m$$9.tag) {
      case 1:
        {
          const v2$$8 = m$$9.fields[1];
          const k2$$12 = m$$9.fields[0];
          f$$6(k2$$12, v2$$8);
          return;
        }

      case 2:
        {
          const v2$$9 = m$$9.fields[1];
          const r$$10 = m$$9.fields[3];
          const l$$10 = m$$9.fields[2];
          const k2$$13 = m$$9.fields[0];
          MapTreeModule$$$iter(f$$6, l$$10);
          f$$6(k2$$13, v2$$9);
          $arg$$43 = f$$6;
          $arg$$44 = r$$10;
          continue MapTreeModule$$$iter;
          return;
        }

      default:
        {
           return;
        }
    }
  }
}

If my nested looping example code return Unit works, presumably you already do something like that for other conditional code such as if..then..else statements, but you need to check as IIRC, I didn't actually use the else part since the loop returned a Unit; this should also be checked, as should any form of conditional code including short-circuit evaluations.

I would like to contribute to your project more, but don't have much experience with writing compilers as to parsing, or processing AST's, and my experience with AST's is limited to their use for generating macros/templates in other languages such as Haskell (my other favourite language other than F# and Elm). Perhaps I could write some tests including to cover cases for this issue, but I guess I would have to be able to compile development builds such as the latest including the proposed fix for this issue to do it?
My test machine ATM is quite limited as to memory - Windows 10 64-bit with 4 Gigabytes, so would that be a problem in being able to build Fable?

I suppose I could also contribute to some libraries, at least as to improving tests...