Function over-specialization on unused parts of types

ChrisRackauckas commented 6 years ago

This is best explained by an example.

mutable struct Tester{F,T}
  f::F
  x::T
  y::T
end
t = Tester((x,y)->2x+2y,1.0,1.0)
function stepper!(t)
  t.x = t.f(t.x,t.y)
end
function other_stuff!(t)
  t.y += 1
end

function looper(t)
  for i in 1:10
    stepper!(t)
    other_stuff!(t)
  end
end
looper(t)
t

In this setup, both other_stuff! and stepper! will specialize on every type parameter change {F,T}. However, let's say that F changes often and T doesn't change often. Then both functions will recompile every time when only stepper! needs to. Thus compilation time is increased.

A user can work around it by pulling apart the pieces of the type they need:

function other_stuff2(y)
  y += 1
end

function looper2(t)
  for i in 1:10
    stepper!(t)
    t.y = other_stuff2(t.y)
  end
end
looper2(t)

here other_stuff2 only takes in a value which has type T, so if T is unchanged then the same compiled code can be re-used (if it doesn't inline and all of that). However, as the operations in other_stuff! get more complicated, the complexity of the function signature and the return that are required to make this work grows.

I see this as a common pattern in scientific computing libraries like DiffEq, Optim, etc. since the code for the general algorithms is much larger in comparison to the code that actually has to touch function calls. The larger algorithm has convergence checks, output generation, etc. that are usually always the same (since there T would just be Float64 which most users would be using). However, the entire function stacks are recompiling each time due to the over-specialization on unused parts of types in the other_stuff! functions. From tests it seems that F specialization can be a big deal for many use cases (as much as 2x-4x in comparison to FunctionWrappers.jl in some real cases!) so that cannot be easily dropped, but fully decomposing every "non-f function" is quite laborious.

The nice option would be for the compiler to recognize this is the case and just not specialize on these functions. Another option would be to allow @nospecialize on type parameters.

JeffBezanson commented 6 years ago

Yes, it would be great to allow

function f(x::T{A,B}) where {A,B}
    @nospecialize A
    ...

Even better to handle it automatically of course, but that's going to be a taller order.

chethega commented 6 years ago

Is it really true that compilation time can be saved?

in cases of very simple other_stuff!, the function gets inlined anyway, and large parts of compilation have to be repeated per call-site
The data layout of Tester{F,T} depends on the size of T. Hence other_stuff! is, at least, not binary compatible between changing F. If, say, F1 and F2 are differently sized isbits closures / callable structs, then other_stuff! compiled for t::Tester{F1,T} should likely segfault / corrupt memory when called on t::Tester{F2,T}, at least if I understand the ABI correctly.

ChrisRackauckas commented 6 years ago

The data layout of Tester{F,T} depends on the size of T. Hence other_stuff! is, at least, not binary compatible between changing F. If, say, F1 and F2 are differently sized isbits closures / callable structs, then other_stuff! compiled for t::Tester{F1,T} should likely segfault / corrupt memory when called on t::Tester{F2,T}, at least if I understand the ABI correctly.

In a mutable struct it's always just a reference here so it'll always be the same size? And this wouldn't work for isbits, sure. So yes there are limitations on when it can be done.

in cases of very simple other_stuff!, the function gets inlined anyway, and large parts of compilation have to be repeated per call-site

There are lots of "quick" functions which aren't get inlined aren't called often. For example, checking convergence after iterations and updating progress bars.

chethega commented 6 years ago

Ah, you got the order of your fields wrong, that's why I was confused. What you are asking for is basically rudimentary inheritance:

mutable struct Tester_head{T}
  x::T
  y::T
end

mutable struct Tester{T,F}
  x::T
  y::T
  f::F
end

Now, anything that takes a Tester_head{T} can also process a Tester{T,F}, regardless of whether we pass by pointer or on the stack (which is afaik the reason for the stack growing the way it does; irrelevant for mutable Tester but important for isbits Tester). In C++ one would have Tester inherit from Tester_head, adding the field f::F (and no virtual functions involved). In C one would reinterpret a Ptr{Tester{T,F}} into a Ptr{Tester_head{T}}. Both give binary compatibility and the same memory layout. Now, I think that the julia-ABI does funny things that break this, at least for isbits Tester, so you probably would want to use function wrappers to enforce C-ABI.

GiggleLiu commented 1 year ago

Run into the exact same issue when trying to use "finite field algebra + Chinese remainder theorem" to achieve arbitrary precision computation of fibonacci numbers.

If the modulus is a field, then the memory usage is doubled and the addition and multiplication functions have to do extra work to ensure the modulus of the operands are the same.
If the modulus is a type parameter, then the compiling time is unbearable for large n.

MWE

using Mods, Primes

# computing fib function by powering the matrix
function fib(::Type{T}, n::Int) where T
    (T[0 1; 1 1] ^ (n-1) * [1, 1])[2]
end

function big_integer_solve(f, ::Type{TI}, max_iter::Int=100) where TI
    N = typemax(TI)
    local res, respre, YS
    for k = 1:max_iter
    N = prevprime(N-TI(1))
        T = Mods.Mod{N,TI}
        rk = f(T)
        if k != 1
            push!(YS, rk)
            res = Mods.CRT(YS...)
            respre == res && return res  # converged
            respre = res
        else  # k=1
            YS = Any[]
            push!(YS, rk)
            respre = BigInt(value(rk))
        end
    end
    @warn "result is potentially inconsistent."
    return res
end

big_integer_solve(T->fib(T, 1000), Int32)

I guess there is no easy solution for this issue yet? I wish the type parameter (or the modulus) of Mod can be left unspecified.

nsajko commented 3 months ago

xref #11339

JuliaLang / julia

Function over-specialization on unused parts of types #26834

MWE