JuliaData / DataAPI.jl

A data-focused namespace for packages to share functions
Other
33 stars 13 forks source link

Suppor dimensional metadata #62

Open Tokazama opened 1 year ago

Tokazama commented 1 year ago

The core metadata methods (metadata, metadata!, etc.) provide an intuitive method of communicating that a key-value pair provides some information about the instance of the object it is attached to. The column metadata methods similarly provide an intuitive method of communicating the a key-value pair provides some information about the column of a particular object. However, data with multiple dimensions (array, mesh, coordinates, etc.) often have metadata stored per dimension.

This PR provides support for metadata corresponding to specific dimensions. New method implemented here follow the name pattern of column metadata methods (colmetadata to dimmetadata).

quinnj commented 1 year ago

This seems fine to me? @bkamins any concerns?

bkamins commented 1 year ago

I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments.

Tokazama commented 1 year ago

I think it is OK to add. I assume @Tokazama has some concrete use case (i.e. I would avoid adding functionalities that will not be used in practice). I have went through the code and left some small comments.

Besides the use cases in the original post I have images that have metadata per dimension in the form of name, per dim machine acquisition information, and additional metadata along axes (similar to colmetadata).

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (b131356) 96.36% compared to head (e1b08c5) 96.77%. Report is 1 commits behind head on main.

:exclamation: Current head e1b08c5 differs from pull request most recent head 6a0af43. Consider uploading reports for the commit 6a0af43 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #62 +/- ## ========================================== + Coverage 96.36% 96.77% +0.41% ========================================== Files 1 1 Lines 55 62 +7 ========================================== + Hits 53 60 +7 Misses 2 2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

bkamins commented 1 year ago

@quinnj - all looks good. OK to merge? (I would make a release next)

Tokazama commented 1 year ago

Anything needed on my end?

bkamins commented 1 year ago

Needs a second approval (which might require some changes, but most likely not) before being merged and released.

bkamins commented 1 year ago

@nalimilan - do you have an opinion on this PR?

Tokazama commented 1 year ago

Why not write 1 instead of using a variable? That sounds clearer to me.

The column metadata tests use :col as the argument for which column is being accessed. I thought using the variable dim instead of just 1 would make it easier to quickly read and understand what exactly was being tested.

nalimilan commented 1 year ago

Question on the intended use: is this PR is about storing information about a dimension itself (e.g. its name or unit), or about its elements (e.g. row names for dimension 1), or both? If the second option, I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2.

Also, how does this relate to AxisArrays, NamedArrays, NamedDims, etc.?

bkamins commented 1 year ago

I wonder how this interacts with the existence of column metadata, as columns are usually considered as dimension 2.

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

Tokazama commented 1 year ago

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

nalimilan commented 1 year ago

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@bkamins Sure. But e.g. for a DataFrame we could imagine implementing dimmetadata at some point given that we define ndims, axes, getindex... So I wanted to make sure we consider whether that would make sense.

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right?

Tokazama commented 1 year ago

I assumed this is completely detached. I.e. colmetadata is for objects that implement Tables.jl interface. And this PR is for array-like objects. Note that array-like objects can be re-interpreted as tables in many different ways.

@bkamins Sure. But e.g. for a DataFrame we could imagine implementing dimmetadata at some point given that we define ndims, axes, getindex... So I wanted to make sure we consider whether that would make sense.

The way to support metadata per element along a dimension with this would be a specific "indices" or "axis" key whose value can be used somewhat like a tables column metadata.

@Tokazama I guess that would work. Then you mean that the intended use case for this PR is both metadata about a dimension itself and metadata about its elements (rows, columns...), right?

Exactly! I figured the alternative was having an additional metadata set of methods (e.g., axesmetadata) which seemed a bit over the top since one can still accomplish similar functionality with the dim methods here. The overlap with column metadata could also be seen as redundant but tables have consistently proven to be very unique since each column can be conceptualized as its own entity.

Tokazama commented 11 months ago

@nalimilan, is this okay now?