andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
278 stars 28 forks source link

[FR] multidict implementation #45

Closed mkschulze closed 3 years ago

mkschulze commented 3 years ago

Just leaving this request here to raise some awareness of multidics and for a possible consideration. Thx

andyferris commented 3 years ago

The idea of distict indices is pretty ingrained into the AbstractDictionary interface here, so this would require some thought.

I'm interested in what case people use a multidict for? I do frequently use a dictionary of vectors (or other collection), for example like this:

dict = Dictionary{I, Vector{T}}()
for x in input
    index = f(x)
    value = g(x)
    push!(get!(Vector{T}, dict, index), value)
end

(This is essentially the SplitApplyCombine.group(f, g, dict) method).

What do you use a multidict for? How are they created? How are they consumed? (I'm honestly asking because I've never used one in anger).

mkschulze commented 3 years ago

What I have in mind is something like Werkzeugs MultiDict in Python: https://werkzeug.palletsprojects.com/en/1.0.x/datastructures/#werkzeug.datastructures.MultiDict

>>> d = MultiDict([('a', 'b'), ('a', 'c')])
>>> d
MultiDict([('a', 'b'), ('a', 'c')])
>>> d['a']
'b'
>>> d.getlist('a')
['b', 'c']
>>> 'a' in d
True

Maybe it could be done with value collection: + get!(dict, key, empty_collection)

There is also a MultiDict in DataStructures: https://github.com/JuliaCollections/DataStructures.jl/blob/master/src/multi_dict.jl

andyferris commented 3 years ago

From https://pypi.org/project/multidict/

HTTP Headers and URL query string require specific data structure: multidict. It behaves mostly like a regular dict but it may have several values for the same key and preserves insertion ordering.

Right. This structure is basically a vector of Pairs, since the keys aren't unique. I suspect Vector{Pair{String, String}} would be a reasonable choice for headers and URL query strings, given their typical size limits.

In other cases these kinds of vectors get long and you want to find all the instances of a certain key. You could just run the vector of Pairs through through group(first, last, vector).

In other cases I've had to do this kind of lookup efficiently before (via a hash map, not a scan), I'm usually thinking of tabular data, with a (non-unique) key column and a value column. In a SQL database you'd add a secondary index. In Julia I use HashIndex from AcceleratedArrays.

andyferris commented 3 years ago

Haha, simultaneous posting.

Maybe we just need a shorthand for push!(get!(Vector{T}, dict, index), value)?

mkschulze commented 3 years ago

I'm a beginner, so my technical creativity is limited yet. But I try to rebuild a small CMS like project I know from Python into Julia. This is the code I try to translate:

https://gist.github.com/mkschulze/9c623d4e885ace6f71452dcbff8b8ab0

andyferris commented 3 years ago

Maybe we just need a shorthand for push!(get!(Vector{T}, dict, index), value)?

Like so, maybe:

function pushgroup!(d::AbstractDictionary, key, value)
    T = eltype(d)
    if T = Any
        T = Vector{Any}
    end
    push!(get!(T, d, key), value)
    return d
end

This could potentially even be exported by SplitApplyCombine...

andyferris commented 3 years ago

This is the code I try to translate:

OK interesting. In that case, the code is just flattening the multidict into a normal dict, ignoring all but the first value.

For this, you could use dictionary. Like this:

julia> dictionary([1 => 1, 1 => 2, 2 => 3])
2-element Dictionary{Int64,Int64}
 1 │ 1
 2 │ 3

The dictionary function should work on any iterable of pairs (or two-tuples, two-vectors, etc).

Is that helpful?

mkschulze commented 3 years ago

I think it is yes! Thank you.