JuliaCollections / DataStructures.jl

Julia implementation of Data structures
https://juliacollections.github.io/DataStructures.jl/latest/
MIT License
689 stars 245 forks source link

Access underlying `Dict` of `DefaultDict` #705

Open goretkin opened 3 years ago

goretkin commented 3 years ago

I think it would be useful to have a documented way to access the underlying Dict of a DefaultDict. The following seems to rely on internals:

julia> dd
DefaultDict{Any,Array{Int64,1},DataType} with 3 entries:
  0 => [3, 6, 9]
  2 => [2, 5, 8]
  1 => [1, 4, 7, 10]

julia> dd.d.d
Dict{Any,Array{Int64,1}} with 3 entries:
  0 => [3, 6, 9]
  2 => [2, 5, 8]
  1 => [1, 4, 7, 10]
oxinabox commented 3 years ago

This is probably a valid thing for convert(Dict, ::DefaultDict) to do following the logic of https://docs.julialang.org/en/v1/manual/conversion-and-promotion/#Mutable-collections

Though I am still not entirely sure of what this is useful for.

goretkin commented 3 years ago

Let me know what you think about this. I often use DefaultDict because the behavior is convenient in a very localized place, and I do not want that behavior elsewhere. For example:

using DataStructures: DefaultDict
"""
    Generalize `filter, like `DataFrames.groupby`

# Examples
\```jldoctest
julia> filter_key(k -> k % 3, 1:10)
Dict{Any,Array{Int64,1}} with 3 entries:
  0 => [3, 6, 9]
  2 => [2, 5, 8]
  1 => [1, 4, 7, 10]

julia> filter_key(iseven, 1:10)
  Dict{Any,Array{Int64,1}} with 2 entries:
    false => [1, 3, 5, 7, 9]
    true  => [2, 4, 6, 8, 10]
\```
"""
function filter_key(key, itr)
    T = eltype(itr)
    out = DefaultDict{Any, Vector{T}}(Vector{T})
    for x in itr
        push!(out[key(x)], x)
    end
    return out.d.d # TODO https://github.com/JuliaCollections/DataStructures.jl/issues/705
end

It would probably be more correct (more defensive) to return something immutable (if not keys and values, then at least the Dict itself) in that case, but barring that, at least I can be defensive by returning a Dict. Alternatively, the fact that I use a DefaultDict for convenience is merely an implementation detail of filter_key, and even though it otherwise follows the AbstractDict interface, it still never throws a KeyError.

oxinabox commented 3 years ago

I often use DefaultDict because the behavior is convenient in a very localized place, and I do not want that behavior elsewhere

I tend to use a plain Dict and get/get! (as in: get!(dict, key, default) and get(()->default, dict, key)) in those circumstances

function filter_key(key, itr)
    T = eltype(itr)
    out = Dict{Any, Vector{T}}()
    for x in itr
        col = get!(()->Vector{T}(), out, x)
        push!(col, x)
    end
    return out
end
goretkin commented 3 years ago

If I'm understanding correctly, that idea can be used to obviate the need for DefaultDict altogether if you're willing to use use get! in place of getindex. Said differently, I see the entire point of DefaultDict to be to delegate all methods transparently, except for that very one transformation you just described.

You bring up an excellent alternative, in any case.