Open Moelf opened 2 years ago
I guess this could work:
Arrow.write("./blah.arrow", Dict(data.indices.values .=> data.values))
Hi @Moelf,
The Dict
constructor expects to get an iterable of Pair
s - or other iterable things where the first element is the key and the second is the value (which explains your strange result).
To go from a Dictionary
to a Dict
use the pairs
function, like Dict(pairs(dictionary))
.
Does that help? Perhaps this should be prominently documented...
Also, we should probably think about the Tables.jl interface at some point...
thanks, the pairs makes sense and probably should have been specialized by Dictionaries.jl since that's the only sensible outcome I think.
Unfortunately a specialisation to insert pairs
would break Dict(copy(pairs(dictionary)))
where you’d expect the copy
to have no effect on the output.
It’s also hard to add methods for all AbstractDict
, for example.
Not sure I understand:
julia> d = Dictionary([1,2,3], [4,5,6])
3-element Dictionary{Int64, Int64}
1 │ 4
2 │ 5
3 │ 6
julia> copy(pairs(d))
3-element Dictionary{Int64, Pair{Int64, Int64}}
1 │ 1 => 4
2 │ 2 => 5
3 │ 3 => 6
this is the current behavior, I propose adding:
julia> Base.Dict(D::Dictionary) = Dict(pairs(D))
julia> Dict(d)
Dict{Int64, Int64} with 3 entries:
2 => 5
3 => 6
1 => 4
julia> Dict(pairs(d))
Dict{Int64, Int64} with 3 entries:
2 => 5
3 => 6
1 => 4
julia> copy(pairs(d))
3-element Dictionary{Int64, Pair{Int64, Int64}}
1 │ 1 => 4
2 │ 2 => 5
3 │ 3 => 6
I don't see why adding Dict()
would break anything.
Edit:
Oh, in the case of Dict(copy(pairs(d)))
, it means we should have specialized copy
too then.
Oh, in the case of
Dict(copy(pairs(d)))
, it means we should have specialized copy too then.
Yes. But we can't specialize a Dict
constructor on this - all it sees is a Dictionary
. Similarly as you can do Dict(zip(keys, values))
, you can also do things like Dict(Dictionary(keys, zip(keys, values)))
and expect it to work the same. If we had Base.Dict(D::Dictionary) = Dict(pairs(D))
this would be broken :(.
There's also the fact that though while we might theoretically try to specialize (::Type{<:AbstractDict})(::AbstractDictionary)
, in practice this will lead to problems with ambiguity errors. Even if we patch those up for Base
they will reappear again on using OrderedCollections
or using DataStructures
.
At the end of the day the only clean choice is to let users write pairs
as necessary.
I'm a happy user of your package, in our line of work we process many many independent files to make a summary histograms or extract parts of the data. In the end I simply do a
reduce((x,y) -> append!.(x,y), results)
to collect the results together without manually tracking the order of things.However, it's rather difficult if I want to put them into a table or anything because
Dictionary
doesn't conform with Table interface (, expected), but also can't go back to Dict:What's the recommended workflow?