JuliaArrays / FFTViews.jl

Julia package for fast fourier transforms and periodic views
Other
21 stars 8 forks source link

performance #17

Open AshtonSBradley opened 4 years ago

AshtonSBradley commented 4 years ago

I wanted to implement a simple circular vector, and ended up finding major slowdown for accessing. Is this expected behavior:

using FFTViews
x = LinRange(-5,5,1000)

@btime x[200:800]
29.597 ns (1 allocation: 48 bytes)

xc = FFTView(x)
@btime xc[-800:-200]
6.299 μs (1 allocation: 4.88 KiB)

is it just due to allocations? It doesn't seem to be the case for a regular @view which is non-allocating

@btime @view x[200:800]
28.780 ns (1 allocation: 80 bytes)

So should @FFTView be regarded more as convenience than performant, or is there a trick to using it safely?

timholy commented 4 years ago

The difference in allocations is mostly a red herring; if you first call collect on x you'll see the same size allocation for both. The issue is that x[200:800], when x::AbstractRange, can return a range. x[200:800] does almost no work when x is a range (type @edit x[200:800] with x = LinRange(-5,5,1000) to see what I mean).

Currently there is no specialized implementation for FFTView(::AbstractRange) that returns a range object; moreover, because of the periodic boundary conditions you might need to return a concatenation of two ranges, so that would require a novel type.

Try calling x = collect(x) and repeat your test and you'll see (1) that the allocations are identical, and (2) the performance gap has narrowed tremendously.

The remaining difference in performance is because getindex calls reindex, and modrange is insansely expensive because integer division is one of the slowest operations on a CPU. But you need it for the periodic boundary conditions.

If you want to fix this, here are three possible strategies (and they can be applied in combination):

I'm hesitant to even guess which of these would be easier to implement or more effective for your purposes. The first will be more effective (by far) for the problem you pose, but probably limited to one-dimensional objects. The second will also be really effective, readily extensible to higher dimensions, but applicable to range-indexing only. Applying mod to ranges is easy; the main challenge there will be not breaking Julia's dispatch rules; you'd want to add ambiguity tests and a few more indexing tests to make sure you haven't broken anything for indexing with many different supported types. The third will help your problem a lot, but it still won't get it to the performance of typical arrays; however, it will generically speed up indexing operations with FFTViews for any supplied index types; this approach will run into fewer dispatch challenges because it's purely an internal change.

The best option, of course, is to do all of them, but that would depend on your level of commitment to this problem.