Closed bsxfan closed 9 years ago
In the case of arrays, on my 64-bit linux (x86_64) I get slightly better performance from copy! than loopcopy! (0.25s vs 0.31s). But by-and-large I replicate your findings.
The problem seems to come from abstractarray.jl:copy!
's use of the construct for x in src
. If I replace that with for j = 1:length(src)...
then they work out the same.
I'll also note that one can do vastly better with subarrays:
parent(S::SubArray) = S.parent
parent(A::Array) = A
function fastcopy!{T}(pdest::Matrix{T}, sdest::NTuple{2,Int}, psrc::Matrix{T}, ssrc::NTuple{2,Int}, sz::NTuple{2,Int})
m = sz[1]
sdest1 = sdest[1]
ssrc1 = ssrc[1]
for j = 0:sz[2]-1
kdest = j*sdest[2]+1
ksrc = j*ssrc[2]+1
for i = 0:m-1
idest = kdest+i*sdest1
isrc = ksrc+i*ssrc1
pdest[idest] = psrc[isrc]
end
end
end
function fastcopy!{T}(dest::StridedMatrix{T}, src::StridedMatrix{T})
@assert size(dest) == size(src)
sdest = strides(dest)
ssrc = strides(src)
pdest = parent(dest)
psrc = parent(src)
fastcopy!(pdest, sdest, psrc, ssrc, size(dest))
dest
end
Test:
julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
3.1146607700000004
julia> sum([@elapsed fastcopy!(D,Ssub) for i=1:100])
0.4422959499999998
Images.jl uses some of these tricks, although I haven't yet profiled to see if they are working as expected. https://github.com/timholy/Images.jl/blob/master/src/algorithms.jl#L117
(Basically, subarrays need the kind of love that I gave array.jl long ago. A variant of make_arrayind_loop_nest
might need to be written to handle two arrays.)
On a slightly orthogonal but not completely unrelated note, there has been the proposal of making array indexing return views. That has benefits of greatly reducing copying and GC, and also removes the need to use subarrays in many cases. Now that we can do stuff like pointer_to_array(pointer(a, m), dims)
, it should be pretty easy to get high performance array views.
Definitely related. I even started working on something like that within the first few days of the merging of immutable types, but got bit by challenges stemming from the need for better inlining. I haven't looked recently.
@ViralBShah: I thought that those views would be subarrays themselves? In my view, producing array views will increase the need to be able to deal with subarrays/strided arrays.
Also, it should be straightforward to get machinery such as what I propose for fast bsxfun
in #3100 to create loops like @timholy's fastcopy
above, on demand.
Thanks @timholy for your nice fastcopy!() example. That clears up some mysteries for me. But yes, you are right, SubArray could definitely benefit from some more attention. I've just logged two more issues involving subarray. See #3114 and #3115.
Have there been any updates on the creation of submatrix views? I am experimenting with using Julia in a linear algebra course that I'm teaching and would find it very helpful for providing high-performance code samples for factorizations.
Gimmee a couple more days, and a faster version of copy!
should land. I've just been swamped with other things.
Meanwhile @lindahua has submitted the latest proposal for a faster ArrayView (#5556), which I also haven't had time to look at yet but which sounds promising.
Can we close this with the new SubArrays, or should we wait for the updates to getindex and such?
We've actually had a fast copy!
for months (shortly after Base.Cartesian landed).
For comparison to
copy!
, define:Create regular and strided source and destination matrices:
Time
loopcopy!
vscopy!
. Timings below are for 32-bit windows, but behaviour on 64-bit linux is similar:OK, very close, but for SubMatrix source:
copy!
is much slower. Similarly:Counterexample: