Open andyferris opened 7 years ago
I should have also mentioned that "logical indexing" dict[inds]
, where the values of inds
are Bool
and the indices of inds
match the indices of dict
, should probably also work, just like they do for AbstractArray
. That would basically be adding Dict(:a=>1, :b=>2, c:=>3)[Dict(:a=>true, :b=>false, :c=>true)] == Dict(:a=>1, :c=>3)
.
This looks really cool! I haven't had the chance to look through it thoroughly yet, but it's very appealing to me that arrays can be thought of as associative containers with a shape.
Would it be a goal of AssociativeArray to support say , if v = Vec([3,2,1])
, something like v[[1,1,1,1]]
(which should be [3,3,3,3])? It doesn't look like that works at present, or is there special syntax for that?
@macd Yes, but does v[Vec([1, 1, 1, 1])]
do what you expect? I didn't put any effort into making Assoc
interoperate with Base
types like Vector
- it's just for demonstration but we can add more to the package if there is more design space to explore.
Very cool, does work as I expect. I didn't get the interoperability issue, but it makes perfect sense.
As someone who has worked on this kind of stuff, I found it interesting that I was initially surprised by some of your choices, until you pointed out the guiding principle. I completely agree that out = a[b]
implies out[i] = a[b[i]]
is the most reliable foundation for these design decisions. :+1:
Thanks @timholy. I should point out that this is 100% motivated by a few of your comments along the lines of "arrays are really just some special kind of associative container" (in regards to motivating/justifying offset arrays) and I've been trying to make sense of that since :)
OK, so far I'm seeing positive feedback here, at least on the indexing behavior. I'd like to know people's opinions on my list of suggested breaking changes that IMO would make all this come together consistently. Enumerated:
Associative
iterate values not index-value pairs. When I first saw Jeff mention this some time ago, I was pretty surprised (my initial reaction was scepticism). However, after thinking about this since then, I pretty much consider this a given now - it will help a lot for generic programming to arbitrary indexables (and we aren't going to make Array
iterate pairs, now, are we?).similar
preserves indices and empty
creates an empty container (with no indices). This is breaking for similar(::Associative)
, but consistent with similar(::AbstractArray)
. I can already see that this will enormously help in creating code/patterns which work across Associative
and AbstractArray
. It's about clear semantics.keys
to indices
and keytype
to indextype
. While this might be seen as pure bikeshedding, I think it might be more intuitive to work with getindex
, setindex!
and indices
(or else I would be tempted to propose getindex
-> getkey
, etc).AbstractArray
a subtype of Associative
. I still don't know how to feel about this one.Anyway, some design feedback here would help in creating an implementation.
Very interesting indeed. Still pondering this. One more major breaking point: Allowing non-scalar indexing on Associatives means you'll no longer be able to have arrays or associatives as keys. That is,
julia> d = Dict(:a=>1, :b=>2, :c=>3, [:c,:a]=>4)
Dict{Any,Int64} with 4 entries:
:a => 1
:b => 2
:c => 3
Symbol[:c, :a] => 4
julia> d[[:c,:a]]
4
@mbauman Yes this is an interesting case. Note that for cases where the indices are strongly typed (such as Dict{Vector{Symbol}, Int}
) we can catch this correctly at dispatch, so not all is lost. What to do exactly when the index type is not concrete or not known is a bit less clear. (I would tend to try and prioritize scalar indexing so this isn't a breaking change at all, but this might introduce a run-time cost in certain situations. However, if the index type is Any
, well, things are probably going to be a bit slow anyway).
Also - it would help if I knew of a use case for such index types. I know we've worried about hashing of vectors and unit ranges and so-on, but I haven't come across a use for this yet...
In any case we need a nicer way to express that an array should be treated as scalar in an expression where arrays usually are splashed, broadcasted or unboxed in other ways, compare the crutch Scalar
from StaticArrays to prevent broadcast. If we figure out that on a high enough level, then also this julep benefits.
Yes, this is also currently a problem for non-scalar indexed assignments for arrays of arrays. That is, A[:] = 1:2
is always interpreted as A[1] = 1; A[2] = 2
and not fill!(A, 1:2)
. It can really be a pain for generic programming.
I have a very strong distaste for changing semantics based upon eltype changes.
That does seem like a tricky case... I'm wondering what the solution is. We could always favour scalar indexing unless we can prove it doesn't work, or something...
(It seems to me that the scalar case is the more fundamental kind of indexing (at least of getindex
, I'm still torn on that setindex!
example). @mschauer I was more wondering if we should force users to use Base.Slice
or some-such to opt-in to non-scalar indexing in ambiguous cases, rather than ask them to opt-in to scalar indexing. OTOH - using something like Scalar
might be easier to use?).
Make Associatives be containers of values, not of index=>value pairs
Then this might no longer be an Associative data-structure. It’s dangerous to generalize the behavior of a concretion, Dict
, to the properties of all abstract Associative
data structures. An Associative Array is defined as "a collection of (key, value) pairs" (https://en.wikipedia.org/wiki/Associative_array). If you pick that collection to be an unordered Set over the keyspace, then you get a Dict
, but if you pick it to be something else, you can still get an Associative subtype. Some other collections that can hold associative pairs include a binary tree (making a SortedDict
), an array (making a MultiDict
, or the ImmutableDict
implementation in Base).
Wikipedia also notes some examples of common Associative data-structures which are not uniquely indexable, including HTML queries and header. They are ordered and can be iterated as pairs, but attempting to index them may return a non-unique result (one name for these is: ordered-multimaps).
It's worth noting that if I were developing non-scalar indexing from scratch right now, I'd totally detangle the scalar and non-scalar behaviors with dot-broadcasting. A[I,J,K]
actually behaves very much like the indices are broadcast together, except that J
is projected into a higher dimension such that it is always completely orthogonal to I
, and K
even higher, and so on. This is APL indexing in a nutshell.
Actually, this makes me realize that we could deprecate the non-scalar indexed assignment of many values to many indices to A[idxs] .= [values...]
. Then we could reclaim A[idxs] = [value]
to place the same array into many locations.
@vtjnash. It is interesting to consider all the possible data structures here. I guess in some senses I've mostly thought Associative
to mean "indexable". To expand, from Wikipedia:
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of (key, value) pairs, such that each possible key appears at most once in the collection.
I'm pretty sure the uniqueness of the keys is pretty important here - we'll need to make some assumptions to make a good indexing interface suitable for generic programming. For my uses, I don't see any problem with using the pairs
function as required - it's also useful for e.g. Array
s. The benefits to having for example map(indexable)
and map(pairs(indexable))
and so-on work in a sensible (index-preserving) and consistent fashion with the same semantic for AbstractArray
and Associative
seem great (not only will some generic code work for both, but perhaps more importantly it reduces the cognitive load for the code author or reader). I do appreciate that it may be traditional to consider such associative maps as containers of key=>value
pairs but I am questioning whether that is the best default choice for iteration for usability in Julia.
I consider the more "exotic" multi-maps and XML structures as being something a bit different to a Julia Associative
(e.g. it seems to me that a MultiDict{K,V}
is just as useful as a Dict{K,Vector{V}}
, with perhaps a bit of an expanded interface to make life easy).
One interesting thing from the Wikipedia article that I have been thinking about is that there is a difference between updating the value corresponding to an existing index, and adding a new index to the collection. It's extremely convenient that I can add new indices to a Dict
with setindex!
but when reading code I do sometimes get confused which operation is being performed (and of course there is no run-time safety here either, to constrain it to one operation or the other). Not sure how/if we should deal with this somehow (I'm thinking something like push!(collection, index, value)
).
Actually, this makes me realize that we could deprecate the non-scalar indexed assignment of many values to many indices to
A[idxs] .= [values...]
This seems like a reasonable idea to me. (It reminds me, I've of done something like @view(A[idxs]) .= some dot expression
before to take advantage of dot-fusion, so it would be nice to get the fusion for free without the @view
).
That should work just fine without the @view
:
julia> expand(:(A[idxs] .= sqrt.(B)))
:((Base.broadcast!)(sqrt, (Base.dotview)(A, idxs), B))
dotview
is a view for non-scalar indexing expressions, regular getindex
otherwise.
Oh, that's nice to know :)
You've now got me wondering if it should be A[i] .= sqrt.(B)
for scalar i
and A.[inds] .= sqrt.(B)
for non-scalar inds
(and A[i]
vs. A.[i]
more generally... I think this is what you were saying above... hugely breaking I know but still interesting... and if we get @
for field referencing from #21912 then something like A@[inds]
for view(A, inds)
... mind blown. Pity it's a bit of an ASCII scramble though).
The major issue with A.[]
is logical indexing and other representations of many indices that aren't just arrays of Ints. We're getting a bit far afield from your original idea here, though. We can keep discussing it over at #22858.
Only because out[i] = a[b[i]]
or out[I] = [out[i] for i in I]
is the design principle in the "Bracket calculus" document I wrote last year, allow me to link it here: https://gist.github.com/mschauer/b04e000e9d0963e40058, see e.g. https://gist.github.com/mschauer/b04e000e9d0963e40058#indexing-of-arrays
map(indexable) and map(pairs(indexable)) and so-on work in a sensible (index-preserving) and consistent fashion with the same semantic for AbstractArray and Associative seem great
I think there's two operations here being described as map
: I'll call them enumerable-map and associative-map. Usually in Julia, map
refers to enumerable-map (e.g. map(identity, (i for i in 1:10))
is perfectly valid, and is not an indexable container). However, we can also define an associative map operation that applies the function just to the value as follows: associative-map(f, a) = map( (k => v) -> (k => f(v)), a )
. In other languages, this function seems to often be called mapvalues
.
We can extend this function to multiple associative collections by making the definition varargs: associative-map(f, a1, as...) = map( (k => v1) -> (k => f(v1, map(a -> a[k], as)...)), a1 )
I like this a lot. The disconnect between associatives and arrays has always bothered me. I like the notion of array as a specialization of associative collections. The most visible cost of changing this would be having to write for (k, v) in pairs(d)
instead of for (k, v) in d
but I think that's a pretty negligible cost for all the benefits it brings. In particular, I think that if you want to address the question of "how should dicts iterate?" you fundamentally have to think about "how should map
work on dicts?" since those two are so deeply connected. The current map
has several issues:
map(f, dict)
should return another dict at all. What if f
doesn't return pairs?f
are non-unique? There are several different possible behaviors in that case, none of which is obviously right.map(f, dict)
passes key-value pairs to f
as a pair of arguments is convenient but fundamentally wrong if we consider dicts to be collections of key-value pairs. The 0.7 behavior where map(f, dict)
passes key-value pairs as a single pair argument is correct with respect to iteration of key-values pairs but super annoying to use.If we change the model for associative collections from being a collection of pairs to being an indexed collection of values and accordingly iterate values instead of pairs, that would clarify the behavior of map(f, dict)
: it transforms values and that's all. It's obvious that it should return a dict because it leaves the indexing structure of the input untouched just as it does for arrays. We could introduce mappairs
and mapkeys
functions, but I'm not sure that we need to since we can just write Dict(f(k) => g(v) for (k, v) in pairs(d))
with appropriate f
and g
.
Another question here: should we continue to treat d[1,2]
as d[(1,2)]
? Given that we've never seen anyone even consider requesting such a monster, I'm not sure it's worth burning time on a multidimensional dictionary that would have a tuple of array-like indices
and have values that span the cartesian product thereof.
But we may want to deprecate the implicit tuplization behavior here in order to make the two more similar.
That syntax has always seemed to me like a convenient poor man's sparse matrix.
What if f doesn't return pairs?
That's the smoking gun. The usual structure of a map
operation is (A->B) -> Collection[A] -> Collection[B]
. For Dict
, A
is a Pair
type, but since B
can be anything this structure does not match. But if we take Dict{K}
as the Collection
type, it matches the structure of mapping over the values.
Also - it would help if I knew of a use case for such index types. I know we've worried about hashing of vectors and unit ranges and so-on, but I haven't come across a use for this yet...
Memoization is a big one.
Index an
Associative{K,V}
with anAbstractArray{K,N}
to get anAbstractArray{V,N}
. E.g.Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1]
.
Mappings and functions are the same thing, so why not replace your proposed indexing with function application: Dict(:a=>1, :b=>2, :c=>3) ∘ [:c, :a] == [3,1]
?
It would be error-prone to write generic code where dict[x]
is either a look-up, or APL-style indexing, depending on x
's type.
@cstjean ∘
already has a meaning which we must preserve, where Dict(...) ∘ [...]
creates a function x -> [...](Dict(...))(x)
. I'm not really comfortable with punning function calls for some kind of indexing behavior.
Thanks @StefanKarpinski for that post. I was trying to mostly avoid the pair-iteration topic here (thinking we should possibly have a separate github issue for that), but you hit the nail on the head when you discuss map
, and in fact that is the entire motivation for me supporting having Associative
iterate values. In my explorations I found that workflows involving higher-order operations like map
, reduce
, filter
and group
(my own function) quickly became unworkable whenever an Associative
was created. This change alone would make data manipulation with Associative
so much easier to handle (especially the synergy with having a similar interface to AbstractArray
).
@mbauman @TotalVerb I would be tempted to deprecate the current dict[a,b]
behaviour meaning dict[(a,b)]
and free it up to allow the package ecosystem to create the "monster" of Cartesian product spaces for keys of associatives if someone wants (at the very least it would be an interesting experiment and I believe not very difficult either). People will still be free to use tuples (and named tuples) as dictionary keys.
∘ already has a meaning which we must preserve, where Dict(...) ∘ [...]
That's a side-effect of the way ∘
is implemented, not useful functionality.
I'm not really comfortable with punning function calls for some kind of indexing behavior.
It's not a pun, it's an isomorphism: every function f(x::A)::B
over a finite countable domain A
can be represented as a Dict{A, B}
. "Composition of associatives" could behave like function composition.
EDIT: Perhaps I should have mentioned that Index an Associative{K,V} with an AbstractArray{K,N} to get an AbstractArray{V,N}. E.g. Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1].
is a composition of the two associative containers. In any case, @mbauman's suggestion looks more practical.
We already have a very nice scheme for detangling scalar and nonscalar meanings. And the one we need — d.[vector]
— is completely available. It'll take some thought, but I totally believe that we can create a workable definition for it.
I'm not seeing how d[a,b]
meaning d[(a,b)]
is at odds with this. Wouldn't that dovetail nicely with the view of dicts as a map between an index and a value? In a matrix, the index is the tuple (i,j)
which is mapped to the matrix entry M[i,j]
. Although I do agree that deprecating it is the safest option in case we discover some inconsistency later on. We can always reintroduce it if it turns out to be useful and consistent.
That's fine, but I thought each index of an Array{T,N}
was a CartesianIndex{N}
, not an NTuple{N, Int}
- I kind-of feel we should pick one or the other and stick with it. :)
At this stage I'm not sure that worrying about Cartesian indexing on Associative
is extremely important unless someone is seriously considering using e.g. the type alias AbstractArray{T,N} = Associative{<:AbstractCartesianIndex{N}, T}
. But in AssociativeArray.jl I did get the feeling that we can more easily build Cartesian indexing machinery like Slice
if the keytype is some Cartesian type, neatly separating everything from cases where the key type is some arbitrary tuple (not restricted to a Cartesian product of sets).
It's worth mentioning that in #24086 we discussed using the syntax a.[b]
for non-scalar getindex and a[b]
for the scalar case. This helps disambiguate which operation you want, especially for Associative
containers where keys might be collections, and relates to an existing ambiguity in setindex!
for arrays (as discussed in #24086).
How about the following:
struct MatrixDict{T1, T2, V}
dict::Dict{Tuple{T1, T2}, V}
MatrixDict{T1, T2, V}() where {T1, T2, V} = new(Dict{Tuple{T1, T2}, V}())
end
Base.getindex(d::MatrixDict, k1, k2) = d.dict[(k1, k2)]
Base.setindex!(d::MatrixDict, v, k1, k2) = d.dict[(k1, k2)] = v
d = MatrixDict{Int, UnitRange{Int}, Int}()
julia> d[1, 1:3] = 1
1
julia> d[2, 1:3] = 2
2
julia> d.[1:2, 1:3]
????
IIUC, the last expression would give an error since it would broadcast the 1:3
. How would I write to only broadcast on the 1:2
while keeping 1:3
as a "scalar".
That's effectively the same problem as marking specific arguments in a f.()
expression to be treated as scalar. And the solution is the same: just "protect" the scalar inside an array/collection: d.[1:2, [1:3]]
. APL indexing rules would mean that adds a dimension, but you could similarly do d.[1:2, fill(1:3)]
to drop it.
That's another motivation for a non-dimension adding scalar wrapper type: https://github.com/JuliaLang/julia/issues/18379
From triage: for 1.0 we mostly need to consider dict iteration and possible renamings. The rest of the indexing stuff can be handled later.
Also from triage, Stefan proposed that we make the interface returned by pairs
/keys
/values
be more complete in order to better support all uses cases of array and dict objects (and sets too). In particular, the proposal was that we should change their return type/behavior to define them as transformations between the 3 fundamental containers in Julia. In particular, define them such that:
pairs(array/dict)::Associative # effectively equivalent to `Associative(zip(keys, values))`
values(array/dict)::AbstractArray # (indexed by the original keys)
keys(array/dict/set)::AbstractSet
I think we can live with dicts iterating either pairs or values --- if they iterate pairs, there will be contexts where you need to write values(d)
, and if they iterate values there will be contexts where you need to write pairs(d)
.
However, it seems to me there are lots of functions that operate on keys and values in certain useful patterns. For example indmin
gives you the index of the value that minimum
returns. It seems obvious to me that indmin
on a dict should give you the key of the minimum value (as it now does). Therefore minimum
should give the minimum value. That implies that either (1) minimum
should call values
on its argument, or (2) dicts should iterate values. (1) works, but values
is the identity function for everything except Associative. Is anything else going to have a meaningful implementation of values
? It seems ugly for a function that can work on arbitrary iterables to have to call values
on its argument.
Then again, sorted dictionaries are sorted by key, so perhaps there's no way to have a totally consistent view.
I've been thinking about sets and keys. Here's a relatively sane property that I'm considering which may be desirable across arrays and associatives:
a[keys(a)] == a
That is, both the keys and values will be preserved by this indexing operation. (Perhaps this will be a.[keys(a)] == a
if @mbauman has his way).
To achieve this:
keys(a)
returns an array or associative which is an "identity mapping" (I made up that term). If b
is an identity mapping then b[i] == i
for all i in keys(b)
. Base.OneTo
is an example of an identity mapping (and I see no reason why a CartesianRange
can't be an array supporting getindex
). From memory, I think @timholy might have made some custom unit-range objects which have this property for offset arrays.AbstractSet
s are identity mappings, and we should in theory be able to have AbstractSet{T} <: Associative{T,T}
. Set
can follow the same internal implementation as current, but support getindex
and so-on. We can still talk about element in set
if we have 3.AbstractSet
s and Associative
s will iterate values, not key-value pairs. We won't need a values
function.I think if we have 1-3 for v1.0, things may be pretty consistent and tied together nicely and ready for 4 in the future.
The final thing I'd suggest for v1.0 is to allow parsing (not lowering) of a.[i]
so that we can experiment with macros in packages on how to implement the OP without doing weird stuff to e.g. indexing of Dict{Any}
s.
a[keys(a)] == a
This property is true of arrays, but is not true of all associative containers. One counter-example is a multi-dicts, but another more commonly encountered counter-example is the min-heap.
You keep talking about multi-dicts, but they are not, in my view, a thing. If m
is a "multidict" then what does m[k]
return? Multiset, sure, that's a thing – it's an unordered collection of values, which can have repeats, in which one can test containment/count in O(1) time – also known as a Dict{K,Int}
. If by "multi-dict" you mean something that maps keys to sets of values, then that's a Dict{K,Set{V}}
.
but they are not, in my view, a thing
You also keep trying to claim this, despite that Julia provides an implementation of one (https://github.com/JuliaLang/julia/blob/master/base/dict.jl#L747), and which also fails to conform to your suggestion that it's just a "Dict{K, Set{V}}"
An ImmutableDict is "persistent" (I believe that term is used in Clojure) in that previous mappings aren't destroyed. But I think the fact that this is exposed by iteration is more of a bug.
another more commonly encountered counter-example is the min-heap
What does indexing mean here? Wouldn't the user-facing interface be that of a priority queue (i.e. deque- or bag-like) rather than as an indexable?
The OP is specifically referring to collections which provide mappings from a unique set of keys to a single value, which is what I thought Associative{K,V}
tries to encapsulate. You can of course use that to represent more complex data structures if the values are themselves containers, or you can make your own abstract type which isn't an Associative
with an appropriate interface for non-unique keys.
But I think the fact that this is exposed by iteration is more of a bug.
It's entirely reasonable to additionally define and create another persistent dict type without needing to first accuse a different data structure of being a bug. It's a feature that we can also implement a ordered multi-dict with the same interface as other associative mappings.
which is what I thought Associative{K,V} tries to encapsulate
By restricting the classification, it'll be a tradeoff. As with any classification system, making the definition more specific ("a unique mapping from a key to a single value") sometimes makes it more useful (you can assume indexing is possible). But it also then excludes other possibilities (the counter-examples I gave), which now will form some new categories. In these cases, these perhaps might be a Queue{ElementT}
(or perhaps more generally an Iterable{ElementT}
) – which currently is just represented as an interface (push/pop or iterate) over other structures – and a MultiMap{KeyT, ValueT}
. We can change the definition of these classes pretty much however we want – there isn’t one right or wrong way to define them – but we should at least acknowledge there are repercussions of the changes: both pros and cons.
The OP is specifically referring to collections which provide mappings from a unique set of keys to a single value
There are alternative ways of writing the OP which don’t make this choice of definition. Here’s one generic implementation option that instead assumes the existence of an appropriate definition for similar
and of pair
iteration:
function getindex.(iterable, keys::Associative)
newmap = similar(first(eltype(keys)) => last(eltype(pairs(iterable)), iterable)
for (newkey, oldkey) in keys
for (key, value) in pairs(iterable)
if key == oldkey
push!(newmap, newkey => value)
end
end
end
return newmap
end
function getindex.(iterable, keys::AbstractArray)
newarray = similar(last(eltype(pairs(iterable))), iterable)
for oldkey in keys
for (key, value) in pairs(iterable)
if key == oldkey
push!(newarray, value)
end
end
end
return newarray
end
FYI I've started some prototyping at https://github.com/andyferris/Indexing.jl
I think the Indexing.jl prototype is working pretty well now. It's not too much code (it probably needs a bit more code to make it really fast) so it doesn't seem too scary to port to Base
at some point.
It's probably a good time to think about future syntax, in case depredations are necessary. I would suggest ending up with something like:
a[i] --> getindex(a, i) # scalar only
a.[inds] --> getindices(a, inds) # or view(a, inds)?
a[i] = v --> setindex!(a, v, i) # scalar only
a.[inds] = v --> setindices!(a, v, inds)
a[i] .= v --> broadcast!(identity, getindex(a, i), v)
a.[inds] .= values --> broadcast!(identity, view(a, inds), values)
Note the lack of dotview
and maybeview
. The last two could support dot-fusion on the RHS. Also, the default for a.[inds]
could potentially move to view
rather than getindices
(or we can make it dovetail with lazy broadcasting a la https://github.com/JuliaLang/julia/pull/23692#issuecomment-352487735 nicely).
IIUC we'd need to deprecate dotview
and perhaps a semantically-ambiguous setindex!
method on arrays (this one noted by @mbauman) in v0.7, if we want the above to appear in v1.x. Thoughts?
I've always loved how in Julia (and MATLAB) one can create a new array from an old one, using what is now the APL indexing rules. Basically if you index a collection of values with a collection of indices, you get a new collection of the indexed values. Beautiful, simple. Indexing has also been extended by allowing arrays that don't use 1-based indexing by e.g. the OffsetArrays.jl package.
I'm not sure if this issue exists elsewhere as its own entity (cleaning up distinctions between arrays and associatives was surely mentioned in #20402 and this Julep seems to be a logical extension of #22907), but here I propose specifically that we extend indexing of and by
Associative
and make related changes so that the semantics are consistent across these two types of container. I prototyped ideas at https://github.com/andyferris/AssociativeArray.jl and basically came up with the ability to (with simple code):Associative{K,V}
with anAssociative{I,K}
to get anAssociative{I,V}
. E.g.Dict(:a=>1, :b=>2, c:=>3)[Dict("a"=>:a, "c"=>:c)] == Dict("a"=>1, "c"=>3)
.Associative{K,V}
with anAbstractArray{K,N}
to get anAbstractArray{V,N}
. E.g.Dict(:a=>1, :b=>2, c:=>3)[[:c, :a]] == [3,1]
.AbstractArray{T,N}
with anAssociative{K,I}
to get anAssociative{K,T}
(whereI
might beInt
for linear indexing, or aCartesianIndex{N}
for Cartesian indexing). E.g.[11,12,13][Dict(:a=>1, :c=>3)] == Dict(:a=>11, :c=>13)
.The semantics are consistent across arrays and dictionaries, and provide that for
out = a[b]
:out
shares the indices ofb
(note: these areCartesianRange
for arrays)out[i]
correspond toa[b[i]]
.This is fully consistent with both the
Base
arrays and the OffsetArrays.jl package (We can do something similar forsetindex!
).To make everything consistent, it helps to make the following associated changes:
Associative
s be containers of values, not ofindex=>value
pairs, so that arrays and dictionaries are consistent on this fundamental point. Use the existingpairs
function when necessary (and ideally make it preserve indexability).similar
always return a container with the same indices, even for dictionaries. Ideally, unifysimilar
acrossAssociative
s andArray
s (for example a dictionary which issimilar
to a distributed array might also be distributed) via use of the indices.empty
function that makes emptyDict
s andVector
s to which elements should be added. (Done, #24390).getindex
andsetindex!
with be calledindices
, rather thankeys
(and rename the currentindices(::AbstractArray)
to something else)view
work for the various combinations wheregetindex
works.The demonstration package also prototypes making
AbstractArray{T, N} <: Associative{CartesianIndex{N}, T}
- I don't think this is strictly necessary but it helped (me) to highlight which parts of the existing interface were inconsistent. The package does demonstrate that we can put something simple together without excessive amounts of code (some performance tuning is surely required).Finally, a word on what motivates this: lately I've been playing with what fundamental data operations (such as mapping, grouping, joining or filtering) would be useful for both generic data structures and tables/dataframes (that iterate rows), and I found whenever I created say a grouping (using a dictionary of groups), I immediately felt the loss of ability to do complex indexing and other operations with the result (as well have to worry whether the output iterates values or key-value pairs, etc).