tail call elimination - Githubissues

gitfoxi commented 10 years ago

It's interesting that this is valid syntax in Julia and Lua:

function rec()
    return rec()
end

The difference is that in Lua, when you call rec() it will stare at you and engage your motherboard's built-in space heater until you ctrl-C out. In Julia:

julia> rec()
ERROR: stack overflow
 in rec at none:2 (repeats 80000 times)

So the stack 80000 things deep. That's interesting.

Why does this matter? I'm not sure. But some people care a lot:

http://www.lua.org/pil/6.3.html

ihnorton commented 10 years ago

See: https://groups.google.com/d/msg/julia-dev/POP6YXCnP-k/_ibmsU0hTdIJ

JeffBezanson commented 10 years ago

I'm a schemehead and appreciate TCO as much as anybody, but we're not actually planning to implement this so I'm just going to close this. Apologies.

gitfoxi commented 10 years ago

There's a volcano of conversation coming from julia-dev. I'll definitely search it as well is Issues in the future before bringing something up. Clearly this topic has been discussed. I think the final word is:

Jeff Bezanson Nov 24 No, no update. It's not a high priority, but eventually we will probably do some of the optimizations, but not the semantic guarantee.

On Wed, Nov 27, 2013 at 6:24 PM, Isaiah notifications@github.com wrote:

See https://groups.google.com/forum/#!searchin/julia-dev/tail-call/julia-dev/POP6YXCnP-k/qjEw2OiFVd8J

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4964#issuecomment-29435198 .

Michael

gitfoxi commented 10 years ago

Oh hey Jeff. I just quoted you from julia-dev but you had already responded to me. How are you doing that?!? You actually showed up in person before I finished reading the thread where you rejected the idea last year.

I want to figure out how this community is so engaged. A few hours before thanksgiving and people are online. This is like the Reddit of ... I'm out of metaphors. I need to go soak my head. What? Wow.

On Wed, Nov 27, 2013 at 7:14 PM, Jeff Bezanson notifications@github.comwrote:

I'm a schemehead and appreciate TCO as much as anybody, but we're not actually planning to implement this so I'm just going to close this. Apologies.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4964#issuecomment-29436759 .

Michael

JeffBezanson commented 10 years ago

I refuse to believe you're out of metaphors. We're all enjoying them :)

ViralBShah commented 10 years ago

Yes, we are enjoying all the metaphors. Please keep them coming. :-)

staticfloat commented 10 years ago

I must admit, I'm not sure why or even how I've gotten so involved with Julia. I think it has something to do with the other contributors, but the experiments proving this are still running.

timholy commented 10 years ago

I refuse to believe you're out of metaphors. We're all enjoying them :)

+1. I'm confident and hopeful that you can come up with something. I haven't yet seen one that involves both bicycle grease and wildebeests :-).

I must admit, I'm not sure why or even how I've gotten so involved with Julia. I think it has something to do with the other contributors, but the experiments proving this are still running.

I can't figure it out either, but it's extraordinarily addictive. Perhaps because it's so rewarding.

StefanKarpinski commented 10 years ago

I can't figure it out either, but it's extraordinarily addictive. Perhaps because it's so rewarding.

Julia specializes in having a very high result-to-effort ratio :-)

StefanKarpinski commented 10 years ago

Mentioning TCO and "tail call optimization" here so that we can avoid future duplicates of this issue.

StefanKarpinski commented 10 years ago

If someone wants to go ahead and implement tail call elimination, then feel free to go for it. I'll leave this open for that eventuality.

gitfoxi commented 10 years ago

+1

Also, language support for lazy evaluation is the other side of that coin. LazySequences.jl demonstrates the power but not the performance that you could have.

On Thu, Jan 9, 2014 at 5:51 AM, Stefan Karpinski notifications@github.comwrote:

Reopened #4964 https://github.com/JuliaLang/julia/issues/4964.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4964 .

Michael

JeffBezanson commented 10 years ago

Look, we don't really need this and it is just a huge pain. It also conflicts with exposing C-callable entry points.

pygy commented 10 years ago

How about supporting the LLVM sibling calls? It doesn't mess with calling conventions and it would already go a long way, assuming the last criterion, which I don't understand, isn't too bothersome.

If any of the callee arguments are being passed in stack, they must be available in caller’s own incoming argument stack and the frame offsets must be the same.

This might also be relevant (and not engaging if you do use fastcc somewhere):

Implications of -tailcallopt [the LLVM flag that enables TCO]:
- To support tail call optimization in situations where the callee has more arguments than the caller a ‘callee pops arguments’ convention is used. This currently causes each fastcc call that is not tail call optimized (because one or more of above constraints are not met) to be followed by a readjustment of the stack. So performance might be worse in such cases.

JeffBezanson commented 10 years ago

Yes, we might as well mark some calls as tail calls and turn on this optimization. But that last criterion is pretty strict; the caller and callee have to have some of the same arguments. I don't know how often that happens. It seems like it excludes even tail-recursive counting loops.

MikeInnes commented 9 years ago

For anyone stumbling across this as I did, I implemented a little macro called @rec for Lazy.jl that may help you out.

using Lazy
@rec function fryeggs()
    return fryeggs()
end
fryeggs() # may take a few hours

@rec re-enables Julia's unique cooking facilities (compatible metal-cased laptop sold seperately, disable fan for best results).

Less interesting example:

@rec reduce(f::Function, v, xs::List) =
  isempty(xs) ? v : reduce(f, f(v, first(xs)), rest(xs))

Might be nice to have simple optimisations like this as a compile step, even if optimising all tail calls is a pain.

oxinabox commented 9 years ago

Recurssion is a very in-style thing for Julia. Because using multiple dispatch to recognise terminating condition based on change of argument. Thus programmer are already thinking recussively.

Though I'm not sure the best usecases are solved with tail call recussion optimistation, vs more significant optimisation to flatten a (overlapping) broader class of recussive algorithms. Which I suspect LLVM is doing already.

ivandbarron commented 9 years ago

Please consider TCO in implementation of finite machine state, in Erlang is relative easy because this, would be great if Julia include too:


%%%% get_content_service.erl
%% Implement next state machine:
%%
%%   From (A) to (B)  when "begin_tag"
%%   From (B) to (B)  when "content"
%%   From (B) to (A)  when "end_tag"
%%   From (A) to end  when "end_document"
%%
%%                end
%%                 ^                                         ___  content
%%   end_document  |                                       /     \
%%                 |             begin_tag                 |     /
%%             -> (A) ---------------------------------> (B) <-
%%            |                                           |
%%             \ ________________________________________/
%%                                end_tag
%%
%%
-module(get_content_service).
-export([start/0]).
-export([a/1]).
start() ->
  register(get_content_service, spawn(get_content_service, a, [[]])),
  run_test(),
  ok.
a(Values) ->
  receive
    {begin_tag} -> b(Values);
    {end_document, From} -> From ! {result, Values}
  end.
b(Values) ->
  receive
    {content, Value} ->
      NewValues = lists:append(Values, [Value]),
      b(NewValues);
    {end_tag} -> a(Values)
  end.
run_test() ->
  get_content_service ! {begin_tag},
  get_content_service ! {content, 1},
  get_content_service ! {content, 2},
  get_content_service ! {end_tag},
  get_content_service ! {begin_tag},
  get_content_service ! {content, 99},
  get_content_service ! {end_tag},
  get_content_service ! {end_document, self()},
  receive
    {result, Values} -> io:format("~w~n", [Values])
  end.

Then in erlang shell:


1> c("get_content_service").
{ok,get_content_service}
2> get_content_service:start().
[1,2,99]
ok

carnaval commented 9 years ago

No one here is against TCO, the problem with saying "we have TCO" is that we then have to guarantee that the optimization is performed for any call in tail position, and LLVM does not at the moment provide a way to enforce this. I would expect it to happen in simple cases though.

yuyichao commented 9 years ago

I would expect the pop of the GC frame would prevent TCO from happening in many cases... (not this one though)

sighoya commented 7 years ago

Correct me if I'am wrong, but can a tail recursive call not be rewritten to a loop by the julia compiler?. I mean that julia recursive code is transformed in loop llvm IR code.

yuyichao commented 7 years ago

As an optimization, sure, with very low priority. As a semantic guarantee, very unlikely.

sighoya commented 7 years ago

@yuyichao What do you mean with semantic guarantee?

yuyichao commented 7 years ago

That functions like

function rec()
    return rec()
end

will be guaranteed to have tail call and not stack overflow.

sighoya commented 7 years ago

Thanks for replying. And why could not the julia compiler write this to while(true) end

yuyichao commented 7 years ago

There's no reason it can't for this case, but it most likely can't guarantee that. Doing that in the compiler level requires guaranteed inferablility (the compiler has to know that you are calling the same function/calling with the same argument types), which is not part of the semantics/guarantee. Doing that without the compiler transformation requires a runtime interface that can do tail call, that's pretty hard/impossible to do (when you need to interleave compiled and "interpreted" frame and somehow tail call both of them).

JeffBezanson commented 7 years ago

Local recursion is the easy case. Full tail-call semantics mean that every call in tail position must use no stack space, no matter how many functions are involved or what the structure of the call graph is.

sighoya commented 7 years ago

Is it a impossible task to guarantee it or does it only slow down your compilation step. If only the latter is true, then something like tag @\doTCO in source code or julia -O3 for the command line would be a solution. I have another question, is it possible to manipulate the julia compiler via a plugin framework, I've heard that some languages do support it?

Keno commented 7 years ago

It is impossible with the current compiler infrastructure if only because LLVM doesn't support it. Whether there is a hypothetical implementation of the language that could provide the guarantee is up in the air. Hooking into the compiler is generally done via generated functions at this point.

JeffBezanson commented 7 years ago

It's not impossible, it's just very difficult for little practical benefit. Also, the point of "guaranteed" TCO is not raw performance --- it's not always faster --- the point is to be able to express constant-space control flow with function calls.

Julia supports macros and @generated functions, which effectively let you extend the language at two stages. Look through the metaprogramming section of the manual. Beyond that I'd recommend asking more specific questions on discourse, explaining what you'd like to accomplish.

pygy commented 7 years ago

Maybe it's been obvious for you all along, but it just occurred to me that you could use more than one LLVM calling convention, @JeffBezanson maybe that's what you meant by "very difficult" rather than "impossible with the current compiler infrastructure".

Couldn't you use a calling convention that supports TCO for Julia => Julia calls, and use CCC for Julia => C and C => Julia? (the latter with a thin wrapper on the C side of things to bridge the CC mismatch... alternatively, Julia functions that are called from both C and Julia could be compiled twice, once for each CC...)

That would enable TCO for pure Julia code, though off course inserting a C frame would break the optimization.

A quick DDG search yielded this: http://lists.llvm.org/pipermail/llvm-dev/2008-December/018955.html so it is not impossible to achieve.

Edit: noting that LLVM only supports TCO for a couple of CPU arch, still not sure it is worth it. My use case would have been fast parser combinators, but I understand it isn't the main focus of the language... I've been out for a while, the progress made in the mean time is impressive :-)

yuyichao commented 7 years ago

Using multiple LLVM calling conventions is not the problem,

We already don't use C calling convention between julia functions though that does not help tail call at all.
The generic case of tail call basically can't avoid calling C function, since type inference and call site de-virtualization (both of which are necessary for removing C calls) are both optimizations and not guaranteed.
A different calling convention can't trivially solve this. Ref my comment right above https://github.com/JuliaLang/julia/issues/4964#issuecomment-117299850

Basically to have this as a guarantee rather than an optimization, you'll need to change many different part of the runtime so that they will not get in the way of tail call.

pygy commented 7 years ago

Ooh, ok my bad...

Couldn't you let users request TCO explicitly (@TCO return foo()?) and have the compiler error out if type inference and devirtualization can't be achieved?

yuyichao commented 7 years ago

A feature that depends on type inference and compilation is not something we want to have. We could of course have a very limited version that are defined to work, but it seems hard at a glance to come up with a restriction that's both useful and easily implementable.

csaltos commented 4 years ago

Sorry to bother, but after 3 years I wonder if there is tail calls support for Julia now ?

PallHaraldsson commented 4 years ago

No, it's easy to check the function at the top, in the latest master, and in my 1-day old master I got:

julia> rec()
ERROR: StackOverflowError:
Stacktrace:
 [1] rec()
   @ Main ./REPL[1]:2

That said, there's still the different Easter-egg language (with TCO)::

$ julia --lisp

StefanKarpinski commented 4 years ago

Why do you need tail call elimination?

bon commented 3 years ago

Why do you need tail call elimination?

So that (mutually) recursive functions can be compiled efficiently.

LilithHafner commented 2 years ago

Why do you need tail call elimination?

For this

Starting Julia...
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _' |  |
  | | |_| | | | (_| |  |  Version 1.6.1 (2021-04-23)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> fib(n, a=zero(n), b=one(n)) = n == 0 ? a : fib(n-1, b, a+b)
fib (generic function with 3 methods)

julia> fib(10)
55

julia> fib(100_000)
2754320626097736315

julia> fib(1_000_000)
ERROR: StackOverflowError:
Stacktrace:
 [1] fib(n::Int64, a::Int64, b::Int64) (repeats 79984 times)
   @ Main ./none:1

julia> fib(big(1_000_000))

signal (4): Illegal instruction: 4
in expression starting at none:1

Also what's with the signal (4): Illegal instruction: 4, wasn't expecting that!

chriselrod commented 2 years ago

FWIW, it overflows before 100_000:

julia> fib(n, a=zero(n), b=one(n)) = n == 0 ? a : fib(n-1, b, a+b)
fib (generic function with 3 methods)

julia> fib(10)
55

julia> fib(100_000)
2754320626097736315

julia> fib(Int128(100_000))
123084731589716971363493501662014098043

(Of course, all else equal, I'd always prefer having an optimization over not having it.) (Pretty sure the Int128 answer above overflowed, too.)

ron-wolf commented 2 years ago

I think that Say What You Mean is a pretty powerful rule in this instance. If you want the language to use an iterative implementation, write an iterative implementation. Julia is really good for code transformations, but baking trivial and not-very-useful ones into the compiler can be a slippery slope.

LilithHafner commented 2 years ago

A whole class of languages (functional programming languages) prefer a recursive over iterative stylistically, making claims such as: "programs are easier to understand without mutation or side-effects" and "there are many constructs that are more naturally represented as recursive than iterative". Some of these languages don't support mutation or loops at all. Designers and users of these functional programming languages Say What They Mean in terms of semantic guarantees and expected output, and expect the compiler to generate efficient compiled instructions from that.

Right now I don't know how to write a non-trivial program (e.g. compute Fibonacci numbers) in Julia without explicitly handling side-effects. This ability might not be very useful to some people, but I think it is incredibly useful to others. It is sad for me to see Julia's syntax and architecture come so close to supporting a functional programming paradigm, but fall short in this critical way.

PallHaraldsson commented 2 years ago

@time fib(big(52_325)) # highest I can do, then 1 higher it hangs; on 32-bit Julia I can actually go higher 130_000, not 140_000, so seems like a bug.

Isn't the/a @tailrec macro good enough (it doesn't support mutual recursion), which gets around the limitation?

(@v1.8) pkg> add https://github.com/TakekazuKATO/TailRec.jl

julia> using TailRec
julia> @tailrec fib(n, a=zero(n), b=one(n)) = n == 0 ? a : fib(n-1, b, a+b)
fib (generic function with 3 methods)

julia> @benchmark fib(46)  # Without the macro is 51x slower, that's the largest Int that doesn't overflow, and maybe an argument to have the macro in Base
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.957 ns … 28.149 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.040 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.088 ns ±  0.871 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █     █▃   ▂                                               ▂
  █▃▄▁█▁██▆▄▇█▄▆▃█▅▄▄▆▅▁▃▅▄▁▁▄▄▃▃▁▃▄▃▄▃▅▅▅▇▇███████▇███▇▇▆▅▅ █
  1.96 ns      Histogram: log(frequency) by time     2.67 ns <

julia> @benchmark fib(big(50_000))  # 14% faster then without the macro. Without big and the macro 3x slower, but both wrong results, except for modulo math, which is likely never useful for fib:
BenchmarkTools.Trial: 83 samples with 1 evaluation.
 Range (min … max):  48.710 ms … 164.389 ms  ┊ GC (min … max): 13.67% … 70.29%
 Time  (median):     58.329 ms               ┊ GC (median):    18.29%
 Time  (mean ± σ):   60.706 ms ±  14.401 ms  ┊ GC (mean ± σ):  20.66% ±  8.19%

     ▅▂█▂ ▃  ▂▃                                                 
  ▃▃█████▅██▆██▃█▅▁▃▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  48.7 ms         Histogram: frequency by time          121 ms <

 Memory estimate: 107.83 MiB, allocs estimate: 250006.

julia> @btime fib(big(1000_000));  # @benchmarks gives same number of allocations, except stating "over"
  10.078 s (5000006 allocations: 40.50 GiB)

LilithHafner commented 2 years ago

@PallHaraldsson @tailrec is seriously outperforming the loop I wrote. Do you know how this is happening?

#]add  https://github.com/TakekazuKATO/TailRec.jl
using TailRec

@tailrec fib_tr(n, a=zero(n), b=one(n)) = n == 0 ? a : fib_tr(n-1, b, a+b)
#fib_tr (generic function with 3 methods)

function fib_loop(n::Integer)
           a = zero(n)
           b = one(n)
           for _ in 1:n
               a, b = b, a+b
           end
           a
       end
#fib_loop (generic function with 1 method)

fib_tr(big(10_000)) == fib_loop(big(10_000))
#true

@btime fib_tr(big(10_000));
 # 2.398 ms (50006 allocations: 5.01 MiB)

@btime fib_loop(big(10_000));
#  4.300 ms (100006 allocations: 6.24 MiB)

PallHaraldsson commented 2 years ago

Yes, but not because of TailRec, since they were not strictly equivalent, more like this way which is only 1% slower:

julia> function fib_loop(n::Integer)
                  a = zero(n)
                  b = one(n)
                  for _ in n-1:-1:0  # and then n fewer allocations, and thus yet faster, with n:big(-1):0
                      a, b = b, a+b
                  end
                  a
              end

I admit I had to stare at the code a bit (after noticing the difference in number of allocations, seems subtracting 1 from BigInt is optimized for allocations; adding 1 can make the result (for a positive number) larger, by one bit, so it seem like they make that assumption allocating always, while subtracting 1 from a positive number can't).

and then in 58% of the time with even fewer allocations:

julia> function fib_loop(n::Integer)
                  a = zero(n)
                  b = one(n)
                  for _ in 1:Int64(n)  # should be safe here?
                      a, b = b, a+b
                  end
                  a
              end

StefanKarpinski commented 2 years ago

It's been mentioned in several discussions of tail call elimination that it's not really an optimization. Optimizations are optional transforms that do not change the behavior of the program but make it faster. Tail call elimination is not optional when you need it; it changes the behavior of the program—stack overflow versus no stack overflow, and in the case of an error, it changes the stack traces substantially; and it's also not necessarily faster to reuse the current stack frame than to start a new one. That's why the title of this issue was changed from "tail call optimization" to "tail call elimination".

All of this suggests that it is probably better to have an explicit tail call annotation—after all, when you need it, the compiler is not really free to not do the "optimization" and if the code changes in such a way that the compiler cannot do the transformation anymore (e.g. you accidentally put some other operation around the tail call), then it should be an error rather than silently working by doing normal recursion. I'm not the only one to observe this: https://news.ycombinator.com/item?id=15698624.

Anyway, I think it would make sense to introduce a @tailcall f(...) macro—or maybe @goto f(...) or @return f(...)—that explicitly replaces the current stack frame with a new function invocation. If you call it with anything besides a function invocation, it should error at macro expansion time. It would also allow invoking any function, not just the one that's currently running, since one of the use cases for tail calls is implementing state machines by transferring between a set of functions without pushing the stack. Another feature of an explicit @tailcall is that since it explicitly replaces the current function on the stack frame, if an error occurs, those other frames will never appear in the stack trace.

StefanKarpinski commented 2 years ago

I guess an aggressive alternative would be that any time someone does return f(...) we consider it a tail call and replace the stack frame rather than recursing normally, but that feels like a bit too much.

ron-wolf commented 2 years ago

@StefanKarpinski Hm. Devil's advocate: maybe implicit TCE would be useful as a compiler/interpreter option?

StefanKarpinski commented 2 years ago

Why? If you're writing code that needs TCE then you need it and being able to run Julia with/without just means your code might be broken by something you don't control. What is the use case for optional TCE? Explicitly required TCE is the only thing that makes sense to me, which then shouldn't be controlled by a flag, it should be a semantic guarantee.

NightMachinery commented 2 years ago

@StefanKarpinski I don’t know about the compiler/interpreter flag, but having some kind of globalish switch (e.g., on the module/package level) is very helpful on brevity. I suggest having the @tailcall macro support modules as well, where all the tail calls in that module get optimized. Perhaps we can add a disabler macro as well, @notailcall.

JuliaLang / julia

tail call elimination #4964