Closed Tokazama closed 2 years ago
This (and your earlier comment in the PR) raise the following issue (which I initially wanted to avoid to ensure simplicity).
We could add a method metadatadict(obj; style::Union{Symbol, Nothing}=nothing)
that would be required to return an AbstractDictionary
supporting its interface - and ensuring that all functions from this interface are fast. If style
is not passed then all key-value pairs are returned. If style
is passed then key-value pairs are filtered to only expose those whose style matches the style
kwarg.
The reason why I have not added this requirement initially was that I was afraid that package maintainers would find it difficult to correctly implement this custom type and metadatadict
function (AbstractDict
interface is more complex than it seems on a surface and it is not properly documented). But maybe we should go for this and have metadatadict
return nothing
by default so that packages can easily detect that some obj
does not have this method implemented.
The alternative, as you propose, would be to add metadatapairs
function (on which we would need to think in detail how to specify its API, but I think it is doable).
(the same for column-level metadata, but let us leave this discussion for later)
CC @nalimilan
The reason why I have not added this requirement initially was that I was afraid that package maintainers would find it difficult to correctly implement this custom type and
metadatadict
function (AbstractDict
interface is more complex than it seems on a surface and it is not properly documented). But maybe we should go for this and havemetadatadict
returnnothing
by default so that packages can easily detect that someobj
does not have this method implemented.
It's definitely tempting to have a way of directly interacting with an AbstractDict
type so we don't have to ever implement something that could just fall back to a dictionary. It's also nice to have a fairly clear idea of how an object behaves when trying to write quick code.
On the other hand, I have similar apprehensions about the cost of an increasingly strict interface at this point. I recall some trial and error to get interfaces like AbstractDict
to work when first learning Julia. Perhaps we could have an extra package that provides a handful of useful types that support the dictionary interface that also support features we require for the metadata interface.
Let us wait for what @nalimilan thinks
Given that we have metadatakeys
, having metadatapairs
would be kind of logical.
metadatadict
is a more tricky issue. If not all packages implement it, how useful would it be in practice? Would we need a fallback that allocates a new Dict
? But in that case mutating it wouldn't affect the original object.
Perhaps we could have an extra package that provides a handful of useful types that support the dictionary interface that also support features we require for the metadata interface.
What would this provide compared with simply using Dict
(which is what DataFrames.jl does for example)?
Would we need a fallback that allocates a new Dict?
It could be a dynamic dict being a view, but I agree - as commented above - that it is tricky.
In summary - do we think adding metadatapairs
makes sense now, or we add it at some later point when someone needs it?
As usual, I'd rather wait until somebody really needs it. ;-)
I couldn't follow all the discussion in #53, but it would make sense to me to just have metadata(df)
dump all the pairs (rather than a new function). I am currently doing this which is a pain:
julia> Dict(key => metadata(df, key) for key in metadatakeys(df))
Dict{String, Float64} with 2 entries:
"RMSE" => 0.10915
"t₀" => 3.974
and the same for colmetadata(df, col)
and colmetadata(df)
?
CC @nalimilan
Any type that supports metadata will likely need a way to iterate over metadata for propagating metadata to new instances or merging metadata. The generic way of iterating through metadata right now is using accessing each key form the iterable returned by
metadatakeys
. Depending on how metadata is stored this may not be the most efficient way to accomplish this. If the method that is propagating metadata internally is managing a single collection of metadata then a unique implementation can be written for whatever method of iteration is best. However, when combining multiple sets of metadata it's not practical to have a unique method for every combination of metadata storage patterns.A couple things to consider in supporting generic iteration over metadata:
metapairs
andstyledmetapairs
.metadatakeys
?