JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.72k stars 5.48k forks source link

Remove conv2 from Base? #18384

Closed tlnagy closed 7 years ago

tlnagy commented 8 years ago

Julia's conv2 is much slower than Matlab's (~100x):

Julia v0.5-rc3

julia> img = rand(1024,1024);

julia> @benchmark conv2(img, [-1 0 1; -2 0 2; -1 0 1.])
BenchmarkTools.Trial:
  samples:          8
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  136.54 mb
  allocs estimate:  235
  minimum time:     613.07 ms (0.99% GC)
  median time:      664.07 ms (8.41% GC)
  mean time:        669.47 ms (8.98% GC)
  maximum time:     749.12 ms (17.76% GC)

julia> versioninfo()
Julia Version 0.5.0-rc3+0
Commit e6f843b (2016-08-22 23:43 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

Matlab 2015b

>> img = rand(1024, 1024);
>> tic(); conv2(img, [-1 0 1; -2 0 2; -1 0 1.]); toc();
Elapsed time is 0.005330 seconds.
tlnagy commented 8 years ago

This might be due to https://github.com/JuliaLang/julia/issues/17000

Even with bumping up the number of threads:

julia> img = rand(1024, 1024);

julia> FFTW.set_num_threads(8)

julia> @benchmark conv2(img, [-1 0 1; -2 0 2; -1 0 1.])
BenchmarkTools.Trial:
  samples:          24
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  136.54 mb
  allocs estimate:  235
  minimum time:     167.53 ms (5.64% GC)
  median time:      234.68 ms (27.24% GC)
  mean time:        218.57 ms (20.45% GC)
  maximum time:     267.15 ms (33.01% GC)

Still 50x slower than Matlab

Maybe https://github.com/JuliaLang/julia/issues/18245 is related?

tkelman commented 8 years ago

binary or source build?

tlnagy commented 8 years ago

Binary from http://julialang.org

simonster commented 8 years ago

Even if this is just #18245, I would guess this particular case would be faster with naïve convolution vs. FFT. According to the docs, that's what MATLAB conv2 does. The FFT-based conv2 algorithm would probably also be faster if it used overlap-save instead of padding both of the signals to the same size.

tlnagy commented 8 years ago

Interesting. I would argue that convoluting with something like a Sobel operator is a pretty common operation in image analysis so it might be good to optimize for this case.

The FFT-based conv2 algorithm would probably also be faster if it used overlap-save instead of padding both of the signals to the same size.

I didn't realize it was doing this, but that could be very expensive due to the huge size difference between the two matrices here.

simonster commented 8 years ago

The Sobel operator is also separable, so the naïve convolution can be implemented even more efficiently. We have a special form of conv2 for separable filters, but it's also in need of optimization. It seems as though some MATLAB functions automatically test for filter separability when given a filter as a matrix.

tlnagy commented 8 years ago

It looks like the separable version is also padding: https://github.com/JuliaLang/julia/blob/06c0ba6233258dbfd9cbb301e3fa115ee1b8de49/base/dsp.jl#L148-L167

So maybe we could do an automatic check for separability for the potential speedup? Something like this: http://blogs.mathworks.com/steve/2006/11/28/separable-convolution-part-2/

GunnarFarneback commented 8 years ago

You may want to look at imfilter in the Images package.

I'd benchmark before assuming separability is a gain for a 3x3 kernel. Two passes through the memory might be too costly.

timholy commented 8 years ago

Rather than Images per se, the state-of-the-art (which will soon be the default in Images) is https://github.com/JuliaImages/ImagesFiltering.jl/pull/5. This now uses a cache-efficient (tiled) algorithm (somewhat similar to http://halide-lang.org/), tricks to avoid explicit padding, and optionally Julia's native multithreading capabilities. These optimizations turn out to make a pretty large difference in performance.

EDIT: I didn't mention Matlab in that PR, but the version in ImagesFiltering is about 3x faster than Matlab, even though Matlab is using multithreading. Even the single-threaded Julia version is faster than multithreaded Matlab on my 2-core machine.

timholy commented 8 years ago

(The tiled algorithm avoids two passes through memory, so the separable algorithm is a win.)

tlnagy commented 8 years ago

It would be great to port some of that high performance code over to conv2 in Base since having a slow function to do a subset of what imfilter does seems like a bad idea.

tknopp commented 8 years ago

no. better remove conv2 from Base and point to Images as the package to be used when using filtering.

andreasnoack commented 8 years ago

It seems useful to have the mathematical method (2d convolution) separated from the application (image filtering) because somebody might want to run a 2d convolution on something that is an image but I don't know the application well enough to tell if that would hurt.

tknopp commented 8 years ago

@andreasnoack: This is a naming confusion. The scope of the Images package is actually nd-arrays. So the algorithms in that package are pretty application free and work on any multi-dimensional data.

tlnagy commented 8 years ago

Do we have a package for nd-array operations already? This is part of the problem with moving everything out Base, IMO.

tknopp commented 8 years ago

The problem is not moving things out of Base but clever organization of the package landscape.

timholy commented 8 years ago

It's worth pointing out that imfilter isn't a drop-in replacement for conv2; conv2 is more symmetric with respect to its two input arrays, and the returned array is larger than either of the inputs because it returns all nonzero values. In contrast, imfilter makes a strong distinction between the input "image" and the "kernel", and the output has indices that match those of the "image." You can obtain the same result by padding the input "image" with zeros, of course, but it's a noteworthy difference in API.

JeffBezanson commented 8 years ago

I do think it makes sense to have separate APIs for image filtering and general convolution, but ideally they should share as much as possible. If they can't share code directly, then they should at least use the same techniques.

tknopp commented 8 years ago

@timholy: Absolutely. My comment should not mean that imfilter can be used as is but that I question the above proposal to pull in advanced filtering/convolution algorithm into Base. Instead better use a high level package like Images. The issue is as Jeff said that the high level interfaces should share infrastructure and core implementation. And that is exactly how you designed the algorithms in Images, no?

timholy commented 8 years ago

It should not be at all hard to write a convn on top of the existing code. ImagesFiltering isn't an enormous package, but it uses a lot of machinery that's not in Base: for example, it depends on all the packages in JuliaArrays (with the exception of EndpointArrays), plus OffsetArrays and ComputationalResources.

tknopp commented 8 years ago

@timholy: And what would be your suggestion for conv2 in Base? I see three possibilities

  1. leave as is (-> "won't fix" for this issue)
  2. improve it
  3. remove it
timholy commented 8 years ago

Either 2 or 3. When the kernel is small, it definitely makes no sense to use FFT. Fortunately, it's not hard to write a generic nd convolution. This is basically the whole thing:

fill!(out, zero(eltype(out)))
for I in CartesianRange(indices(A))
    a = A[I]
    for J in CartesianRange(indices(B))
        out[I+J] += a*B[J]
    end
end

except really the LHS has to be out[I+J+offset] because for historical reasons we implicitly shift the indices of the output so that indexing starts at 1. (I must say, having arrays with indexing that doesn't start at 1 makes all kinds of filtering operations a whole lot cleaner---all your code looks just like the mathematics, and you no longer have to mentally translate everything for the padding.)

This won't have the fancier features of ImagesFiltering (tiling i.e. support for efficient "cascades" of multiple filters, multithreading, support for IIR filtering, etc), but still it should be pretty decent, and it's a tiny amount of code all of which is supported by Base.

GunnarFarneback commented 8 years ago

Quite frankly, for the purpose of image filtering, the only reason to use conv2 is that you're running Matlab and can't afford an Image Processing Toolbox license. In all other respects imfilter is the superior option and in Julia it's only a Pkg.add away.

So the question is what other applications have use of conv2 and to what extent they benefit from optimizations. Or maybe the question is why the DSP module is still in Base and not a package (and even so, conv2 looks rather out of place there).

simonster commented 8 years ago

Or maybe the question is why the DSP module is still in Base and not a package (and even so, conv2 looks rather out of place there).

Yes, I think it should go. Most of it is already implemented in packages anyway.

tlnagy commented 8 years ago

I've changed my tune, I think we should remove conv2 from Base. imfilter in ImageFiltering is much superior to this. This function was thrown together as a convenience for people migrating from matlab, but is both slow and doesn't belong. I think it's presence is misleading and people might use it instead of searching for a better alternative (like imfilter).

PallHaraldsson commented 8 years ago

Needs conv2 be deprecated (in 0.6 or just removed there)? Can it be in 0.5.x? Even added to 0.5.0 NEWS?

@tknopp "remove conv2 from Base and point to Images as the package to be used when using filtering."

Point from NEWS or from an actual deprecate warning message? Both?

[Seems drastic to just remove slow stuff if slower than MATLAB (or in case of Python), but my understanding from scanning is replacement is fast [enough] as MATALB? E.g. not asked to reimplement or move elsewhere.]

@simonster "ImagesFiltering can own conv2", did you mean Images.jl? I see no ImagesFiltering.jl.

I'm looking conv2 (or related) in Images.jl and now DSP.jl and actually only found (so far):

https://github.com/JuliaDSP/DSP.jl/blob/master/src/util.jl#L216 "unsafedot" is it because of @inbounds? (or @simd for i in 1:cLastIdx ?) I thought this was supposed to be safe, up until new 0-based.. First of, is "unsafe" a recommended prefix; and here now error_ a better one.. as both functions broken for 0-based?

https://groups.google.com/forum/#!topic/julia-dev/MdTBxgVoab0

tlnagy commented 8 years ago

We are talking about https://github.com/JuliaImages/ImageFiltering.jl, which hasn't been officially released yet, but should be soon. It has an extremely fast imfilter routine which can do 2D convolutions.

tknopp commented 8 years ago

Independently of the question if a convolution function belongs to base the function conv2 is absolutely unjulian since it encodes the dimension in the function name.

JeffBezanson commented 8 years ago

Ok, let's plan to remove dsp.jl from Base:

timholy commented 8 years ago

Deconvolution.jl seems to contain exactly one function, which is great, but makes me wonder whether it makes sense to add to it.

We'll also want lucyrichardson, nearestneighbor, blind, etc. It just takes someone to write them :smile:. I don't think it would make sense to have a package per method, so I think a general Deconvolution package makes sense.

Can we add a wrapper in ImageFiltering.jl that behaves like conv2?

Should be doable. Maybe file an issue over at ImageFiltering so I don't forget.

tlnagy commented 8 years ago

Should be doable. Maybe file an issue over at ImageFiltering so I don't forget.

Done.

StefanKarpinski commented 8 years ago

Any progress here?

KristofferC commented 7 years ago

This is done.