Open johnmyleswhite opened 10 years ago
The dot3(a, b)
warm up must be dot3(vals)
.
In my system dot3 is as performant as dot2, maybe you get that time in dot3 due to it not being correctly warmed up...
You're right. Thanks for catching that!
Superseded by #7311.
@quinnj I'm not sure how staged functions solve this. Can you elaborate?
Yeah, this is not the same issue.
Sorry. I thought that
if there were a way to make the compiler recognize that a little bit of second-stage type inference could give major performance gains
was exactly what staged functions were getting us.
The work item here is to identify poorly-typed loops, and carve them out into separate functions that can be specialized at run time. It's an extra compiler optimization pass.
@quinnj, a staged function will receive that ::Vector{Any}
as its input, so it won't have any more type information than the parent.
While that is true, it can actually throw an error in that case, in which case it will be called again at runtime with the correct types. That's part of the magic :).
Not sure I understand. What if I explicitly feed it an Any[[1,2,3],[1.0,2.0,3.0]]
? At no point will you get good type information about this object.
Now, if you wrote it as
function dot4{T}(vals::T)
...
and passed it ([1,2,3],[1.0,2.0,3.0])
then it would be a different story. But you don't need a staged function for that.
I think the issue is a case where the runtime type information is there but the compile time one isn't so the loop won't be optimized. If there's never good type information then we can't do anything anyway.
Is this still an issue? On 1.7.2, I see this result on my machine:
julia> using BenchmarkTools
julia> n = 1_000_000
1000000
julia> a, b = randn(n), randn(n);
julia> vals = Any[a, b];
julia> function dot1(vals::Vector)
a, b = vals[1], vals[2]
x = 0.0
for i in 1:length(a)
x += a[i] * b[i]
end
return x
end
dot1 (generic function with 1 method)
julia> function dot2(a::Vector, b::Vector)
x = 0.0
for i in 1:length(a)
x += a[i] * b[i]
end
return x
end
dot2 (generic function with 1 method)
julia> function dot3(vals::Vector)
a, b = vals[1], vals[2]
function inner(a, b)
x = 0.0
for i in 1:length(a)
x += a[i] * b[i]
end
return x
end
return inner(a, b)
end
dot3 (generic function with 1 method)
julia> @btime dot1(vals);
158.501 ms (6998980 allocations: 122.05 MiB)
julia> @btime dot2(a, b);
1.528 ms (1 allocation: 16 bytes)
julia> @btime dot3(vals);
1.529 ms (1 allocation: 16 bytes)
EDIT: Updated the measurement according to @Keno's remark. The issue still stands.
The issue is for vals = Any[a, b];
. The {}
syntax was removed many years ago, but corresponds to Any[]
.
While trying to optimize performance of operations on DataFrames, I've found that a recurring pattern underlies a lot of the performance bottlenecks people encounter: functions that take in DataFrames as input don't have enough type information to specialize their implementations for the specific types of the columns stored in a DataFrame. This means that a two-part function that (a) selects a subset of columns to compute on and then (b) performs an intensive computation on those columns can't take advantage of the type information available at the end of part A to improve the performance of part B.
It's possible to articulate this pattern without any reference to DataFrames: you get the exact same effect if you try pulling whole columns out of a
Vector{Any}
and then computing on them. The only workaround I've found is to use an inner function defined inside of the two-part function. This inner function helps to defer type specialization until a time at which it can be carried out effectively.The three examples below highlight this problem:
If you run these functions with the following data, you get very substantial performance differences between the initial naive implementation and the other two:
On my machine, I get relative timings of the following orders of magnitude:
It would be a huge gain to future users of DataFrames if there were a way to make the compiler recognize that a little bit of second-stage type inference could give major performance gains.