Adding `IdDictionary` - Githubissues

andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance

Other

278 stars 28 forks source link

Adding `IdDictionary` #114

Closed theogf closed 9 months ago

theogf commented 1 year ago

It would be great to have the Dictionaries equivalent of IdDict from base. We would have the token based on objectid instead of hash. It looks pretty straightforward to implement but maybe I am missing some issues.... Any thoughts @andyferris ?

andyferris commented 1 year ago

Yes. So IdDict is kinda like Dict with a different hash (objectid) and isequal (===). To me the appropriate way to represent this is (and always was):

struct Id{T}
    value::T
end

isequal(x::Id, y::Id) = x.value === y.value
hash(x::Id) = objectid(x.value)

dict = Dictionary{Id{K}, T}()

for whatever K and T you want.

Perhaps we could have some kind of dictionary where the keys are lazilly mapped (generally useful, e.g. JSON3.jl has an interesting correspondence between string keys and symbol keys), and then lazily map them by Id.

I'm not sure what everyone elses opinions are? @theogf I agree for completeness we should cover this use case here at some point.

mtfishman commented 1 year ago

I like the idea of having dictionaries with custom hashing behavior or mappings between keys, for example I am using Dictionaries to store the metadata of graphs in https://github.com/mtfishman/DataGraphs.jl and for edge data it would be nice to have a canonical way of defining a dictionary such that inputs of Pair(src, dst) and Edge(src, dst) get mapped to the same key/hash value, or for undirected graphs we would want Edge(src, dst) and Edge(dst, src) to be treated as the same key/hash value. Lazily mapped keys sounds like a good way to do that.

Another example is if you are using a Dictionary to define a dictionary-of-keys sparse array type (such as https://github.com/Jutho/SparseArrayKit.jl) where you might want CartesianIndex and Tuple inputs to map to the same key.

For now I am handling that in the way I define getindex/setindex! to access graph metadata but it would be nice for that to be handled at the level of the dictionary.

andyferris commented 1 year ago

@mtfishman Have you considered defining convert(::Type{Edge{A, B}}, x::Pair) and vice-versa? Just make sure hash and isequal and isless is defined the same for both and it'll work out-of-the box. (For CartesianIndex maybe the same is possible, but it's really up to the people in Base).

But yeah "lazily mapped keys" rather than "convert keys" (default behavior) would be cool.

mtfishman commented 1 year ago

@andyferris I don't think that would be ideal since I wouldn't always want that behavior, it would depend on the context of where the dictionary is being used if I want Pair and Edge to be treated as the same keys (i.e. when the dictionary is the edge data storage of a metagraph).

It seems like a way to do this could be to have an object-local convert function, i.e.:

struct MappedKeyDictionary{I, T, F <: Function, D <: AbstractDictionary{I, T}} <: AbstractDictionary{I, T}
  dictionary::D
  key_map::F
end
function Base.getindex(d::MappedKeyDictionary{I}, i) where {I}
  return d[d.key_map(I, i)]
end

so then a custom key map/conversion could be defined for a particular instance of the dictionary (though maybe something else needs to be done with the hashing?).

andyferris commented 1 year ago

Yes, OK, I think that could make sense.

A few thoughts:

How do we iterate the pairs or keys of the dictionary? Do we require the invese function to key_map?
Does it make sense if key_map is not an isomorphism? Would a canonical pseudo-inverse be sufficient?
We should probably have a set-like MappedIndices, and perhaps build a dictionary around that. (The above questions apply to the set).

theogf commented 9 months ago

Closing as alternative solutions were proposed