andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
278 stars 28 forks source link

Indices with type-parameterized values #55

Open cscherrer opened 2 years ago

cscherrer commented 2 years ago

This PR makes a small change to generalize Indices, replacing

mutable struct Indices{I} <: AbstractIndices{I}
    # The hash table
    slots::Vector{Int}

    # Hashes and values
    hashes::Vector{UInt} # Deletion marker stored in high bit
    values::Vector{I}

    holes::Int # Number of "vacant" slots in hashes and values
end

with

mutable struct Indices{I,V} <: AbstractIndices{I}
    # The hash table
    slots::Vector{Int}

    # Hashes and values
    hashes::Vector{UInt} # Deletion marker stored in high bit
    values::V

    holes::Int # Number of "vacant" slots in hashes and values
end

The idea is to allow values stored in other formats, for example a StructArray or TupleVector. I think this will be a step toward being able to implement something like

struct LogWeighted{I, T, V} <: AbstractDictionary{I, T}
    inds::Indices{I,V}

    # keys.data == inds.values
    keys::PermutedVector{V}

    logweights::Vector{T}
    logtotal::LogSumExp{T}

    function LogWeighted{I, T, V}(inds::Indices{I}, values::V, ::Nothing) where {I, T, V<:AbstractVector{T}}
       @assert length(values) == length(_values(inds))
       return new{I,T,V}(inds, values)
    end
end
andyferris commented 2 years ago

Thanks @cscherrer.

This also seems natural with Dictionary and its values. It's worth stating that a downside to this is that there is annoying increase in verbosity to the types, compared to Base.Set and Base.Dict. This could possibly be severe enough that it is worth having this functionality as an entirely new type, even. (For context, I still consider much of this work as prototyping behavior that potentially could be ported to Julia 2.0, so we need a similar level of polish and simplicity as Base).

The one thing we have at the moment that I'd really like to preserve is that we can state whether a dictionary is issettable or isinsertable. Now this will depend on the underlying arrays. Before moving forward I think we need to make a plan for this. To handle arrays and dictionaries gracefully, I have been considering the following body of work:

Any thoughts around this would be appreciated. (CC @c42f).

andyferris commented 2 years ago

Also - we currently have a contrib directory where stuff people want to play with could live.

I think we could rename this to src/experimental, with the idea that we'd accept all sorts of containers-in-progress and experimental APIs there, and have them as unexported types and docstrings saying they are experimental. The unordered hash dictionary for example is still quite useful. We'll need to create a WeakKeyDictionary. I want to make sure the token interface works with sorted dictionaries (trees and such).

cscherrer commented 2 years ago

Thanks @andyferris, I appreciate the care you're taking in thinking through this. I wasn't aware of the Julia 2.0 plan, but that sounds like it could work out great :)

Putting it in contrib or experimental would be fine for me. Any ideas for a name for this? It more abstract than Indices, but still a concrete type, so I'm not sure what to call it.

andyferris commented 2 years ago

I wasn't aware of the Julia 2.0 plan

To be clear this is more of a personal wish than an agreed-upon plan :)

I haven't thought of any good name. CustomIndices? GenericIndices? Also, feel free to use a single-character prefix.

andyferris commented 2 years ago

Another approach is to add CustomIndices (or whatever) and maintain an exported const Indices{I} = CustomIndices{I, Vector{I}}?