Codegen inefficiency due to splatting

I just ran into a case where bad codegen, not incomplete inference, resulted in calls to the runtime that prevents us from using Cassette for GPU codegen. Pasting my notes here:

using Cassette
Cassette.@context Noop

function main()
    a = [0]

    function kernel(T, ptr)
        unsafe_store!(ptr, 1)
        return
    end

    kernel(Int, pointer(a))
    code_llvm(kernel, Tuple{Type{Int}, Ptr{Int}}; debuginfo=:none)
    # good code; T and ptr end up in different slots, T is marked constant

    # https://github.com/JuliaLang/julia/blob/592878623d376c71e5452dc2775fa2f7a4e097ca/src/codegen.cpp#L6116

    # define void @julia_kernel_12936(%jl_value_t*, i64) #0 {
    # top:
    #    %2 = inttoptr i64 %1 to i64*
    #    store i64 1, i64* %2, align 1
    #   ret void
    # }

    Cassette.overdub(Noop(), kernel, Int, pointer(a))
    code_llvm(Cassette.overdub, Tuple{typeof(Noop()), typeof(kernel), Type{Int}, Ptr{Int}}; debuginfo=:none)
    # bad code; T and ptr are in the vaSlot, which is not constant or unused.
    #           the varargs slot isn't concretely typed, so we get a call to jl_f_tuple
    #           and calls to jl_f_getfield to access values

    # https://github.com/JuliaLang/julia/blob/592878623d376c71e5452dc2775fa2f7a4e097ca/src/codegen.cpp#L6198-L6201

    # define void @"julia_#4_12937"(%jl_value_t* nonnull, i64) #0 {
    # top:
    #   ...
    #   %19 = call nonnull %jl_value_t* @jl_f_tuple(%jl_value_t*l_value_t* null to %jl_value_t*), %jl_value_** %2, i32 2)
    #   ...
    #   %23 = call nonnull %jl_value_t* @jl_f_getfield(%jl_value_t*l_value_t* null to %jl_value_t*), %jl_value_t** %2, i32 2)
    #   %24 = bitcast %jl_value_t* %23 to i64**
    #   %25 = load i64*, i64** %24, align 8
    #   store i64 1, i64* %25, align 1
    #   ...
    #   ret void
    # }

    function kernel(ptr)
        unsafe_store!(ptr, 1)
        return
    end

    Cassette.overdub(Noop(), kernel, pointer(a))
    code_llvm(Cassette.overdub, Tuple{typeof(Noop()), typeof(kernel), Ptr{Int}}; debuginfo=:none)
    # good code; we still have a vaSlot but it's concretely typed.

    # https://github.com/JuliaLang/julia/blob/592878623d376c71e5452dc2775fa2f7a4e097ca/src/codegen.cpp#L6193-L6196
end

isinteractive() || main()

The splat comes from how arguments are passed to/by overdub. It would be possible to try and change that, or by having the ability to force specializing varargs (e.g. https://github.com/JuliaLang/julia/issues/34365#issuecomment-573989610).

JuliaLabs / Cassette.jl

Codegen inefficiency due to splatting #160