slow SubMatrix copy - Githubissues

bsxfan commented 11 years ago

For comparison to copy!, define:

loopcopy!(D,S) = (for i=1:length(D);D[i]=S[i];end;D)

Create regular and strided source and destination matrices:

S = randn(1000,1000); 
Ssub = sub(randn(2000,1000),1:2:2000,1:1000);
D = zeros(1000,1000); 
Dsub = sub(zeros(2000,1000),1:2:2000,1:1000);

Time loopcopy! vs copy!. Timings below are for 32-bit windows, but behaviour on 64-bit linux is similar:

julia> sum([@elapsed copy!(D,S) for i=1:100])
0.452816014
julia> sum([@elapsed loopcopy!(D,S) for i=1:100])
0.4669809420000002

OK, very close, but for SubMatrix source:

julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
1.8858308039999998
julia> sum([@elapsed copy!(D,Ssub) for i=1:100])
10.769275100000002

copy! is much slower. Similarly:

julia> sum([@elapsed loopcopy!(Dsub,Ssub) for i=1:100])
3.675649855999999
julia> sum([@elapsed copy!(Dsub,Ssub) for i=1:100])
12.968587926999998

Counterexample:

julia> sum([@elapsed loopcopy!(Dsub,S) for i=1:100])
2.3974653309999994
julia> sum([@elapsed copy!(Dsub,S) for i=1:100])
2.401044135

timholy commented 11 years ago

In the case of arrays, on my 64-bit linux (x86_64) I get slightly better performance from copy! than loopcopy! (0.25s vs 0.31s). But by-and-large I replicate your findings.

The problem seems to come from abstractarray.jl:copy!'s use of the construct for x in src. If I replace that with for j = 1:length(src)... then they work out the same.

I'll also note that one can do vastly better with subarrays:

parent(S::SubArray) = S.parent
parent(A::Array) = A

function fastcopy!{T}(pdest::Matrix{T}, sdest::NTuple{2,Int}, psrc::Matrix{T}, ssrc::NTuple{2,Int}, sz::NTuple{2,Int})
    m = sz[1]
    sdest1 = sdest[1]
    ssrc1 = ssrc[1]
    for j = 0:sz[2]-1
        kdest = j*sdest[2]+1
        ksrc = j*ssrc[2]+1
        for i = 0:m-1
            idest = kdest+i*sdest1
            isrc = ksrc+i*ssrc1
            pdest[idest] = psrc[isrc]
        end
    end
end

function fastcopy!{T}(dest::StridedMatrix{T}, src::StridedMatrix{T})
    @assert size(dest) == size(src)
    sdest = strides(dest)
    ssrc = strides(src)
    pdest = parent(dest)
    psrc = parent(src)
    fastcopy!(pdest, sdest, psrc, ssrc, size(dest))
    dest
end

Test:

julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
3.1146607700000004

julia> sum([@elapsed fastcopy!(D,Ssub) for i=1:100])
0.4422959499999998

Images.jl uses some of these tricks, although I haven't yet profiled to see if they are working as expected. https://github.com/timholy/Images.jl/blob/master/src/algorithms.jl#L117

timholy commented 11 years ago

(Basically, subarrays need the kind of love that I gave array.jl long ago. A variant of make_arrayind_loop_nest might need to be written to handle two arrays.)

ViralBShah commented 11 years ago

On a slightly orthogonal but not completely unrelated note, there has been the proposal of making array indexing return views. That has benefits of greatly reducing copying and GC, and also removes the need to use subarrays in many cases. Now that we can do stuff like pointer_to_array(pointer(a, m), dims), it should be pretty easy to get high performance array views.

timholy commented 11 years ago

Definitely related. I even started working on something like that within the first few days of the merging of immutable types, but got bit by challenges stemming from the need for better inlining. I haven't looked recently.

toivoh commented 11 years ago

@ViralBShah: I thought that those views would be subarrays themselves? In my view, producing array views will increase the need to be able to deal with subarrays/strided arrays.

Also, it should be straightforward to get machinery such as what I propose for fast bsxfun in #3100 to create loops like @timholy's fastcopy above, on demand.

bsxfan commented 11 years ago

Thanks @timholy for your nice fastcopy!() example. That clears up some mysteries for me. But yes, you are right, SubArray could definitely benefit from some more attention. I've just logged two more issues involving subarray. See #3114 and #3115.

poulson commented 10 years ago

Have there been any updates on the creation of submatrix views? I am experimenting with using Julia in a linear algebra course that I'm teaching and would find it very helpful for providing high-performance code samples for factorizations.

timholy commented 10 years ago

Gimmee a couple more days, and a faster version of copy! should land. I've just been swamped with other things.

Meanwhile @lindahua has submitted the latest proposal for a faster ArrayView (#5556), which I also haven't had time to look at yet but which sounds promising.

ViralBShah commented 9 years ago

Can we close this with the new SubArrays, or should we wait for the updates to getindex and such?

timholy commented 9 years ago

We've actually had a fast copy! for months (shortly after Base.Cartesian landed).

JuliaLang / julia

slow SubMatrix copy #3098