BIDS / datarray

Prototyping numpy arrays with named axes for data management.
http://bids.github.com/datarray
Other
87 stars 20 forks source link

Support units/quantities #53

Closed unidesigner closed 13 years ago

unidesigner commented 13 years ago

Is there a discussion of properly supporting (physical) units per axes?

As I understand, packages such as "quantities" support units for array elements, but not to denote the distance between elements along one dimension (e.g. temporal/sampling frequency or spatial/spacing).

It might be very beneficial to have an additional field on the axis to define the unit (in a well-specified form, e.g. as string). Special convention could be introduced for pure "index-dimensions" or dimension with a structured (nested) datatype.

PS: If there is any other place to post this comment for discussion, please let me know.

fperez commented 13 years ago

There has been a small amount of discussion on units, but nothing conclusive yet. Basically there was only so much available bandwidth from the people working on this, and the rest of the API consumed it all. There is a provision for generic metadata on the axes, so that field could be used to store units information (it's a dict).

If we want to think about units in the API explicitly, that discussion needs to happen in concert with the units of the data in the array itself. We don't want to have two ways of handling units, one for the contents of the array and one for the axes. So basically what I'm saying is that this will need to be tackled within the numpy list itself, so that any solution we use is consistent across all parts of the dataset.

An array may for example have axes that represent space/time (t, x) but data fields with different units (imagine a structured dtype with Temperature, Pressure and velocity). This means that the dtype itself is probably the place where units need to be represented, since that is the entity in numpy that describes the data in the array.

Units are hugely important, but also a really complex topic full of tricky special cases. They deserve a thorough discussion in numpy so we get them right, so by all means I'd encourage having a good discussion on the list on this topic. I have the impression that with the work that's going on right now on datetime there may not be enough available bnadwidth to also tackle units, but once the dust settles on that, it would be the right next topic to tackle.

unidesigner commented 13 years ago

Thanks for the reply. Let's wait then until the time is ripe to start the discussion in numpy.

fperez commented 13 years ago

No problem. I'll close this issue here, and hopefully before long we can work on this problem in numpy itself.