JuliaArrays / AxisArrays.jl

Performant arrays where each dimension can have a named axis with values
http://JuliaArrays.github.io/AxisArrays.jl/latest/
Other
200 stars 41 forks source link

AxisArrays allows repeated values in axis #192

Open jd-lara opened 3 years ago

jd-lara commented 3 years ago

It seems that AxisArrays doesn't check that the names in an axis are unique. The MWE currently works and it seems it shouldn't

MWE

julia> ax1 = fill(randstring(10), 100);
julia> t = AxisArray(rand(100, 48), ax1, 1:48);
julia> t
2-dimensional AxisArray{Float64,2,...} with axes:
    :row, ["YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG"  …  "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG", "YF263D1vZG"]
    :col, 1:48
And data, a 100×48 Array{Float64,2}:
 0.544045   0.706665  0.794698   …  0.146636   0.173689   0.860602
 0.818773   0.665337  0.598364      0.318264   0.779036   0.49364
 0.506061   0.634103  0.429253      0.727027   0.280061   0.105964
 0.188533   0.438404  0.205067      0.803538   0.469894   0.706383
 0.232693   0.520553  0.731804      0.483673   0.0525813  0.735506
 0.708654   0.495217  0.419302   …  0.187662   0.394126   0.286807
 0.0296371  0.145384  0.67488       0.363728   0.976938   0.679723
 0.736628   0.130968  0.583726      0.295144   0.465268   0.2539
 0.537004   0.585194  0.204467      0.491539   0.428528   0.259942
 0.884711   0.496042  0.0305943     0.0416132  0.279033   0.792419
 ⋮                               ⋱  ⋮                     
 0.161877   0.163169  0.645313      0.978483   0.168679   0.731583
 0.0828598  0.774799  0.987582      0.466019   0.214213   0.757673
 0.333942   0.919552  0.512247      0.420129   0.268359   0.412811
 0.843079   0.416188  0.821125      0.0913968  0.298315   0.681747
 0.137244   0.171126  0.953037   …  0.272229   0.2507     0.822746
 0.0817602  0.75145   0.767151      0.988747   0.458262   0.584586
 0.674519   0.518633  0.91036       0.255896   0.724942   0.637565
 0.360949   0.11814   0.974368      0.24273    0.957803   0.365758
 0.497645   0.396735  0.0457641     0.511115   0.099716   0.0692743
mcabbott commented 3 years ago

it seems it shouldn't

Why should this be disallowed? If e.g. hcat is going to work, then it will inevitably sometimes produce duplicates. Lookup works by findfirst I think:

julia> t = AxisArray(rand(Int8, 3, 4), ['a', 'a', 'b'], 0:3)
2-dimensional AxisArray{Int8,2,...} with axes:
    :row, ['a', 'a', 'b']
    :col, 0:3
And data, a 3×4 Matrix{Int8}:
  60   -4   -80  -17
 -92   -8   -42   93
  19  -41  -106  -72

julia> t['a',4]
-17
ParadaCarleton commented 3 years ago

it seems it shouldn't

Why should this be disallowed? If e.g. hcat is going to work, then it will inevitably sometimes produce duplicates. Lookup works by findfirst I think:

julia> t = AxisArray(rand(Int8, 3, 4), ['a', 'a', 'b'], 0:3)
2-dimensional AxisArray{Int8,2,...} with axes:
    :row, ['a', 'a', 'b']
    :col, 0:3
And data, a 3×4 Matrix{Int8}:
  60   -4   -80  -17
 -92   -8   -42   93
  19  -41  -106  -72

julia> t['a',4]
-17

I think that's the problem -- AxisArrays allows you to look up keys that aren't unique, which (I assume) very often indicates a bug, rather than being the user's actual intention.

mcabbott commented 3 years ago

which (I assume) very often

But just asserting this doesn't answer the question. Why shouldn't the labels attached to axes be categories or dates or some other information, which may have duplicates?

If you want to ensure they are unique, you can call unique. But if that were built in, then it would be hard to avoid.