Open AshtonSBradley opened 4 years ago
The difference in allocations is mostly a red herring; if you first call collect
on x
you'll see the same size allocation for both. The issue is that x[200:800]
, when x::AbstractRange
, can return a range. x[200:800]
does almost no work when x
is a range (type @edit x[200:800]
with x = LinRange(-5,5,1000)
to see what I mean).
Currently there is no specialized implementation for FFTView(::AbstractRange)
that returns a range object; moreover, because of the periodic boundary conditions you might need to return a concatenation of two ranges, so that would require a novel type.
Try calling x = collect(x)
and repeat your test and you'll see (1) that the allocations are identical, and (2) the performance gap has narrowed tremendously.
The remaining difference in performance is because getindex
calls reindex
, and modrange
is insansely expensive because integer division is one of the slowest operations on a CPU. But you need it for the periodic boundary conditions.
If you want to fix this, here are three possible strategies (and they can be applied in combination):
getindex
method for FFTView
s of ranges. You'll need to create a DoubleRange
type or something. This will get hairy beyond one dimension, though.getindex
method for range indices. Currently, every index in 200:800
gets mod
ded. However, for a range you only need to apply the boundary conditions once, breaking it up into at most two ranges. This goes beyond the typical AbstractArray interface but there is nothing wrong about supplying higher-order methods as long as you are prepared to deal with the consequences. This will be far easier than the first option to generalize to multiple dimensions.FFTView
struct so that it stores a fast multiplicative inverse so as to avoid integer division. We already do this for our ReshapedArrays
, no reason not to do it here.I'm hesitant to even guess which of these would be easier to implement or more effective for your purposes. The first will be more effective (by far) for the problem you pose, but probably limited to one-dimensional objects. The second will also be really effective, readily extensible to higher dimensions, but applicable to range-indexing only. Applying mod
to ranges is easy; the main challenge there will be not breaking Julia's dispatch rules; you'd want to add ambiguity tests
and a few more indexing tests to make sure you haven't broken anything for indexing with many different supported types. The third will help your problem a lot, but it still won't get it to the performance of typical arrays; however, it will generically speed up indexing operations with FFTViews for any supplied index types; this approach will run into fewer dispatch challenges because it's purely an internal change.
The best option, of course, is to do all of them, but that would depend on your level of commitment to this problem.
I wanted to implement a simple circular vector, and ended up finding major slowdown for accessing. Is this expected behavior:
is it just due to allocations? It doesn't seem to be the case for a regular
@view
which is non-allocatingSo should
@FFTView
be regarded more as convenience than performant, or is there a trick to using it safely?