Closed GordStephen closed 7 years ago
I haven't followed the combining stuff at all (and I'm not using cat
), but it occurs to me that both might be handled by cat(newaxis, A, B, ...)
?
One problem with the current scheme is that keyword arguments are not inferrable, and in general I don't think either merge
or join
are inferrable (try inserting @inferred
in various places in test/combine.jl
).
The problem I see with merge
right now is the time it takes to merge 2 AxisArray
. Right now, I'm trying to merge 2 AxisArray
of size 366x1068x510
. after 12 minutes, it is still not finished and it takes about 26 GB or RAM (and still rising). I killed the process before it went higher (shared server!).
If I simply use vcat
, it's mostly instantaneous.
I will provide some more info when I get the time.
edit - To answer the question in 1st post: Yes, I'm using vcat
in my package as a way to merge time-dependent AxisArrays (not using TimeAxisArrays yet).
That's a good datapoint, thanks!
Here's a simple exercise that takes way too much RAM and time for merge to be useful (at least for Climate data, where this size of arrays are everyday work).
Again, I killed the process before it ended to avoid sys admin knocking at my door!
data = randn(400, 1000, 500);
julia> dataOut = AxisArray(data, Axis{:time}(1:400), Axis{:lon}(1:1000), Axis{:lat}(1:500));
julia> dataOut2 = AxisArray(data, Axis{:time}(401:800), Axis{:lon}(1:1000), Axis{:lat}(1:500));
julia> test = merge(dataOut, dataOut2)
With a simple array of size 400x10x5
, the merge method is about 380 times slower on our server. I can't see how merge
can replace cat
with such a penalty.
After warmpup
julia> using AxisArrays
julia> data = randn(400, 10, 5);
julia> A1 = AxisArray(data, Axis{:time}(1:400), Axis{:lon}(1:10), Axis{:lat}(1:5));
julia> A2 = AxisArray(data, Axis{:time}(401:800), Axis{:lon}(1:10), Axis{:lat}(1:5));
@time vcat(A1, A2)
0.000499 seconds (133 allocations: 324.578 KB)
@time cat(1, A1, A2)
0.000551 seconds (133 allocations: 324.578 KB)
@time merge(A1, A2)
0.176711 seconds (1.14 M allocations: 49.500 MB, 6.34% gc time)
Ok, while merge
is far from optimized right now (my new project for this weekend!), keeping cat
around as a fast, bare-bones combination operation for well-behaved data seems compelling.
As discussed in #25 and illustrated in #64, it's not clear whether concatenating arbitrary AxisArrays together makes sense. As far as I can tell
merge
andjoin
provide the functionality that you would want fromcat
, provided your axes have meaningful names and values (and if they don't, why are you using an AxisArray?).I'd be interested to know if:
cat
merge
/join
can't do.For some historical context, I implemented
cat
first and realised soon after that I needed something a bit smarter - that's when I addedmerge
/join
. Had I had a better understanding of my needs I likely wouldn't have added it to begin with. Now it just seems like a source of potential confusion and frustration: the same axis constraints that govern the other functions apply tocat
as well, it just isn't smart enough to do the right thing when those constraints would be violated, and throws an error instead.If
cat
is indeed redundant, it might make sense to either:cat
methodmerge
orjoin
, depending on the arguments (e.g. does the user supply a new axis forjoin
ing on)?One issue with the second option is that
cat
implies a lower-level control over dimensional ordering thatmerge
andjoin
abstract away, so users may expect it to do things it won't. Of course, this is true of the currentcat
method as well.