Closed musm closed 7 years ago
@stevengj it may actually be better to make these global constantant refs instead of allocating new pointers every time the make_parser function is called, e.g.
const cb = Ref{Ptr{Void}}()
function test()
cb[] = cfunction( ...)
end
but perhaps not worth it?
@musm, it would certainly be possible, e.g. you could initialize them in __init__
. I'm not sure it makes any measurable difference, however, since on subsequent calls cfunction
just returns the a pointer to the previously compiled code. (It doesn't "allocate" a pointer, since pointers themselves aren't heap-allocated.)
In a quick benchmark, it seems like calling cfunction
repeatedly is about the same speed as looking up a Ref
value, and both involve no allocations:
julia> using BenchmarkTools, Compat;
julia> foo(x) = x + 1
foo (generic function with 1 method)
julia> f() = cfunction(foo, Int, (Int,))
f (generic function with 1 method)
julia> g() = C_NULL
g (generic function with 1 method)
julia> @btime f();
1.825 ns (0 allocations: 0 bytes)
julia> @btime g();
0.031 ns (0 allocations: 0 bytes)
julia> const cb = Ref{Ptr{Void}}()
Base.RefValue{Ptr{Void}}(Ptr{Void} @0x0000000109733040)
julia> cb[] = f()
Ptr{Void} @0x00000001195fe640
julia> f_cached() = cb[]
f_cached (generic function with 1 method)
julia> @btime f_cached();
1.827 ns (0 allocations: 0 bytes)
Actually, it looks like the cfunction
pointer is just inlined in the compiled code nowadays, so there is zero overhead:
julia> @code_llvm f()
define i8* @julia_f_61016() #0 !dbg !5 {
top:
ret i8* bitcast (i64 (i64)* @jlcapi_foo_60882 to i8*)
}
julia> @code_llvm f_cached()
define i8* @julia_f_cached_61063() #0 !dbg !5 {
top:
%0 = load i8*, i8** inttoptr (i64 4426056496 to i8**), align 16
ret i8* %0
}
Yeah the only difference is in the case of directly calling the cfunction, the operation is a bitcast:
ret i8* bitcast (i64 (i64)* @jlcapi_foo_61627 to i8*)
and using a const Ref that is initialized in __init__
the function is then just a load operation
%0 = load i8*, i8** inttoptr (i64 169524560 to i8**), align 16 ret i8* %0
which essentially have the same cost as you mention, so the difference is basically nil
bitcast
is just LLVM keeping track of the type, I think, it doesn't actually translate to any machine instruction. The native code just pushes a literal address into the return register:
julia> @code_native f()
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[3]
pushq %rbp
movq %rsp, %rbp
Source line: 1
movabsq $4815779184, %rax ## imm = 0x11F0AF570
popq %rbp
retq
Whereas the cached version requires an additional instruction to dereference the ref pointer:
julia> @code_native f_cached()
.section __TEXT,__text,regular,pure_instructions
Filename: REPL[8]
pushq %rbp
movq %rsp, %rbp
Source line: 1
movabsq $4528309056, %rax ## imm = 0x10DE88340
movq (%rax), %rax
popq %rbp
retq
nopw %cs:(%rax,%rax)
However, the additional cost doesn't seem to be measurable by BenchmarkTools. (And who knows if it actually costs any extra cycles thanks to pipelining etc... CPUs are complicated.)
In any case, the non-cached version is simpler, so I would just stick with that.
close https://github.com/JuliaIO/LibExpat.jl/issues/77