JuliaLang / AllocCheck.jl

AllocCheck
Other
215 stars 8 forks source link

First attempts #1

Closed baggepinnen closed 9 months ago

baggepinnen commented 11 months ago

Hey guys, thanks for working on this package!

I am trying it out on some of my controller implementations (think the controller that swings up the pendulum you saw at Juliacon) and I am noticing a few patterns in my code that I want your opinions on. I am opening this issue to discuss whether those patterns can be accommodated by AllocCheck, or I need to think about best practices for writing code that is more in line with what AllocCheck can analyze

Optional logging

The first pattern I tend to use rather often is optional logging, for example

verbose && @info "Pos = $x"

where verbose is a keyword argument. Scattering these print statements around is useful for debugging (and also to enable Ctrl-C to terminate the control loop), but I usually disable the logging and printing when "running for real". The simple function below gave me 156 allocations when verbose = true and 0 when verbose = false (nice!)

using AllocCheck

function controller(; verbose=true)
    a = 0.0
    for i = 1:100
        a = a + i
        verbose && @info "a = $a"
        Libc.systemsleep(0.01)
    end
end

controller() # Works
check_ir(controller, ())

My question is, was I lucky with the definition

function controller(; verbose=false)
   ...
end

isempty(check_ir(controller, ()))

or this is indeed something I can rely on?

Side question, verbose here is a keyword argument, and it's the value rather than the type that is important. When I tried with verbose = false, I changed the definition of controller to default to verbose = false. Was I saved by constant propagation?

Check loop rather than function

Most of my controller implementations look something like below, where I first allocate some working memory (logvector) and then run an almost infinite loop

function run_almost_forever()
    N = a_large_number
    logvector = zeros(N)
    for i = 1:N
        y = sample_measurement()
        logvector[i] = y
        u = controller(y)
        apply_control(u)
        Libc.systemsleep(0.01)
    end
end

Here, I am only concerned with the loop not allocating, and am fine with the initial allocation of logvector = zeros(N). I could refactor this code to split out the loop body:

function run_almost_forever2()
    N = a_large_number
    logvector = zeros(N)
    for i = 1:N
        loop_body(logvector)
    end
end

function loop_body(logvector)
    y = sample_measurement()
    logvector[i] = y
    u = controller(y)
    apply_control(u)
    Libc.systemsleep(0.01)
end

and analyze loop_body using check_ir. My question is then, how can I be sure that there is no allocation coming from

    for i = 1:N
        loop_body(logvector)
    end

for example, due to inference failure on logvector?

A very nice interface (you don't have to accommodate this, it's just some thinking out loud) would be

function run_almost_forever()
    N = a_large_number
    logvector = zeros(N)
    @no_allocs for i = 1:N
        y = sample_measurement()
        logvector[i] = y
        u = controller(y)
        apply_control(u)
        Libc.systemsleep(0.01)
    end
end

where I use the macro annotation @no_allocs to indicate exactly which block is not allowed to allocate.

Error recovery

The last pattern touches upon what we spoke about in the meeting yesterday, exceptions. I often wrap the control loop in try/catch/finally so that I can make sure exit gracefully (apply emergency brake etc.) and also GC.enable(true) if something goes wrong. The following worked nicely, no allocations

function treading_lightly()
    a = 0.0
    GC.enable(false)
    try
        for i = 10:-1:-1
            a += i
        end
    catch
        exit_gracefully()
    finally
        GC.enable(true)
    end
    a
end
exit_gracefully() = nothing

treading_lightly() # Works

check_ir(treading_lightly, ())

but computing sqrt(i) in the loop does generate some allocations (as expected)

function treading_lightly()
    a = 0.0
    GC.enable(false)
    try
        for i = 10:-1:-1
            a += sqrt(i)
        end
    catch
        exit_gracefully()
    finally
        GC.enable(true)
    end
    a
end
exit_gracefully() = nothing

treading_lightly() # Works

check_ir(treading_lightly, ())

Here, I'd like to indicate that I have thought about the potential problem with sqrt(-1) and that I am fine with this being a possibility. I know that the GC will not trigger during the execution of the emergency recovery exit_gracefully since I have turned the GC off, and I also know that the allocations due to this error will not accumulate since I'm about to terminate the controller and turn the GC on again.

What would be a good strategy to handle this situation?

Unrelated to allocations, but in this situation, I would also like to make sure that exit_gracefully is fully compiled before starting to execute treading_lightly, since if I ever need to call exit_gracefully I probably cannot tolerate any JIT latency. Manually calling exit_gracefully to compile it is not always an option.

topolarity commented 11 months ago

Thank you for the detailed and thoughtful write-up! This is really excellent 👍

To write down some early thoughts:

  1. Optional logging - Would it be acceptable for your use case to lift verbose to the type domain like this:

    function controller(; verbose::Val{Verbose} = Val(true)) where {Verbose}
        a = 0.0
        for i = 1:100
            a = a + i
            Verbose && @info "a = $a"
            Libc.systemsleep(0.01)
        end
    end

    Trying to leave this in the value domain is possible but quite a bit more complicated - I think it'd require running inference and using that to filter which paths of the code remain live after constant propagation.

  2. Check loop rather than function - There's a few options here:

    1. How would you feel about annotating the function call with the types you used for check_allocs? For example
      for i = 1:N
          # XXX: If types change, update the `loop_body` check_allocs test
          loop_body(logvector::Vector{MyType})
      end

      and then in your tests @test length(check_allocs(loop_body, (Vector{MyType},))) == 0. (This requires that you know the concrete types for that function call.)

    2. Would it be acceptable to filter out allocation sites that might happen more than once (i.e. those that are in a loop or in a recursive function cycle? This is probably a bad idea, since the compiler has no way to tell the difference between a "small" loop in your initialization code versus a "hot" loop in your main function body.
    3. If neither (i) or (ii) work out, we could explore an @inbounds-like annotation that would only affect your function when it's run through check_allocs
  3. Error recovery - The current plan is to be able to check for allocations under the assumption that no errors are thrown. That means that we'd ignore any allocations in code that inevitably leads to a throw, and also any code that exclusively comes from a throw (i.e. catch). Would that handle your use case correctly?

topolarity commented 11 months ago

Also @gbaraldi is there a way for us to ensure that check_allocs(foo, (A, B)) also covers a foo(::A, ::B) call-site, so that the reasoning in (2i) is valid in general? Specifically, we'd want to ensure that we won't have to box normally unboxed types just to satisfy the function call ABI.

Might be a @nospecialize corner case, but seems an important contract to provide the user.

topolarity commented 11 months ago

Oh and also you are right about constant propagation making the difference in (1).

What happens is that the kwcall essentially expands to controller_inner_func_(false)

julia> @code_typed optimize=false controller()
CodeInfo(
1 ─ %1 = Main.:(var"#controller#1")(false, #self#)::Core.Const(nothing)
└──      return %1
) => Nothing

which is inlined by the optimizer and then constant propagation removes the problematic code paths from there.

We can't count on that though:

  1. The inlining of controller_inner_func_ is not guaranteed to happen
  2. controller() generates different code in general than controller(; verbose=false) (even if the default of verbose is actually false 😅)

Those caveats are why lifting to the type domain is a more reliable way to force this constant propagation.

baggepinnen commented 11 months ago

Would it be acceptable for your use case to lift verbose to the type domain like this

Yes, I don't mind, it seems like a simple and effective approach!

How would you feel about annotating the function call with the types you used for check_allocs?

Yeah that's a good idea :) Sorry for being overly pedantic, but how do I really know that for i = 1:N doesn't allocate on me? In this case, I guess I could 1:(N::Int), which is good enough for me, but in general?

Would it be acceptable to filter out allocation sites that might happen more than once (i.e. those that are in a loop or in a recursive function cycle? This is probably a bad idea, since the compiler has no way to tell the difference between a "small" loop in your initialization code versus a "hot" loop in your main function body.

Maybe it could be useful to reduce noise in some situations, but I share your assessment about init loops etc.

topolarity commented 11 months ago

Yeah that's a good idea :) Sorry for being overly pedantic, but how do I really know that for i = 1:N doesn't allocate on me? In this case, I guess I could 1:(N::Int), which is good enough for me, but in general?

Not overly pedantic at all 🙂

The "KISS" approach would be to move the for into loop_body itself.

function run_almost_forever()
    # Do one-time setup
    N = a_large_number
    logvector = zeros(N)

    # Run the hot, alloc-free loop
    run_almost_forever_(N::Int, logvector::Vector{Float64})
end
function run_almost_forever_(N, logvector)
    for i = 1:N
        y = sample_measurement()
        logvector[i] = y
        u = controller(y)
        apply_control(u)
        Libc.systemsleep(0.01)
    end
end

I do get the feeling we'll want to be able to annotate regions of your code to check for allocations (it's a reasonable ask), but maybe this can tide you over until then.

baggepinnen commented 11 months ago

The "KISS" approach would be to move the for into loop_body itself.

Of course, I should have realized this one myself :) The pattern now looks very much like a "function barrier" one might use to mitigate type instability. This is already a pattern many are familiar with so that's perhaps a nice way to explain it. I like it, seems easy to understand and to work with :)

topolarity commented 9 months ago

Considered this fully explored now that all this knowledge has been concretized in https://github.com/JuliaComputing/AllocCheck.jl/pull/15 🙂