andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
282 stars 28 forks source link

Indexing with tuples as keys #15

Open junglegobs opened 4 years ago

junglegobs commented 4 years ago

With dictionaries, I'm used to being able to write stuff like this:

d = Dict((i,j) => j^i for i in 1:10, j in ["a","b"])
d[1,"a"] # returns "a"

This isn't possible with this package:

d = dictionary((i,j) => j^i for i in 1:10, j in ["a","b"])
d[1,"a"]
ERROR: MethodError: no method matching getindex(::HashDictionary{Tuple{Int64,String},String}, ::Int64, ::String)
d[(1,"a")] # This is fine

Maybe this isn't possible or not what this package is designed to do, but I thought I'd bring it up.

andyferris commented 4 years ago

Sorry for the delayed response - apparently I wasn't "watching" this github repository.

Currently multi-dimensional indexing is reserved as we may wish to implement "multi dimensionsal dictionaries" in the future and the API for that hasn't been sorted out yet. It is a very important item on the TODO list but there are some more fundamental concerns with 1D containers to be ironed out first (for example, in #13 we are ensuring the iteration order of all dictionaries is stable and predictable, which will allow other things like sorting and permuting and so-on, and has implications on mapping and broadcasting, etc). I'm imagining something like the keys being some variant of CartesianIndex(1, "a") and multi-key getindex will automatically construct these, like so: Base.getindex(d::AbstractDictionary, a, b, c...) = d[CartesianIndex(a,b,c)]. But we'd still like a convenient way of having multiple non-Cartesian indices like in NDSparse... so it needs to be mulled over some more.

Mirage10 commented 3 years ago

some interesting use cases of multitimensional dictionaries:

  1. Graphs (directed) G=Dictionary([3 4;5 1; 3 1], [3.0,5.4,6.1]) # Graph with edges 3->4 , 5->1, 3->1 with values 3.0, 5.4 and 6.1 respectively

I=Index([3 4;5 1; 3 1]) I[:,1] should iterate over all incomming nodes for node 1: 3, 5 I[3,:] should iterate over all outgoing nodes of node 3: 4,1]

  1. Hypergraphs (undirected) H=Dictionary([Set(4,3,2),Set(5,1,2,6,7),Set(5,9)],["a","b","c"]) # Hypergraph with 3 hyperedges. each edge is a set of nodes. each hyperedge has a label assigned to

I=Index([Set(4,3,2),Set(5,1,2,6,7),Set(5,9)]) I[5] iterates over all hyperedges, where 5 is contained in: Set(5,1,2,6,7) , Set(5,9) I[Set(5,9)] iterates ofver the single hyperedge Set(5,9)

undirected Hypergraphs are generalization of undirected graphs: graphs have edges with exactly 2 nodes, hyperedges can have arbitrarily nodes

of course other methods for the index could be contains, not contains, etc ...

Dictionaries.jl is a very nice approach. Also with respect to genericity. In my opinion the way to go for the standard. Many thanks to the contributor(s)!!

andyferris commented 3 years ago

Yes, in a lot of ways a dictionary with 2-tuple keys is a much more natural way to think about a graph than a sparse matrix, since the keys values can be meaningful. Hopefully the token approach can make it just as fast yet easier to code.

Still, the internal representation matters - being able to find I[a, :] fast relies on certain nesting structures being available, and I[:, b] being fast relies on yet more.

I think there's a lot of interesting work to do with both Cartesian products of keys as well as subsets of such products (the "sparse" approach).