Should NdArray be allowed to omit non-varying domain axes?

covjson / specification

CoverageJSON specification

https://covjson.org/spec/

45 stars 6 forks source link

Should NdArray be allowed to omit non-varying domain axes? #64

Closed letmaik closed 8 years ago

letmaik commented 8 years ago

I was thinking, since we now have a standalone range in the form of an NdArray object, would it make sense to allow to leave out non-varying axes? This would be how typical labelled-array libraries handle that, like xarray. In xarray, the DataArray is a bit like a Coverage with one parameter, and there the array data has varying dimensions and associated coordinates, but the DataArray/Coverage object itself also can have other coordinates, like a fixed time coordinate.

I think to make NdArrays less awkward and promote the use of domain axes for additional non-varying things (like country ID, or fixed time) it would be good to not force the NdArray to have all axes of the domain. This would also reduce the confusion of which order to choose for those additional axes.

Client-side I don't see any complications.

jonblower commented 8 years ago

Yes, I think your approach is sound. My only worry would be that it’s quite easy for a data producer just to stick the axis names in the array, irrespective of whether they are single-or multiple-valued. It takes a small amount of extra thought to remember to omit the single-valued axes. But I don’t think it’s a big deal. Maybe if a data producer accidentally inserts the name of a single-valued axis the client could simply ignore it.

letmaik commented 8 years ago

It would still be allowed to include single-value axis names. It's just that they are not strictly required.

letmaik commented 8 years ago

Hmm, maybe I have to rethink that. I didn't consider the case of all-fixed axes, like for Point. Then at least one axis would suddenly be required again, which then is again arbitrary. Maybe let's leave it as it is then.

jonblower commented 8 years ago

If all axes are fixed, then each Range only has one value. So the axisNames field is redundant and could even be omitted (or empty?). In any case, clients wouldn't need to read it at all, right?

letmaik commented 8 years ago

Sure, but then it's not a proper nd-array anymore. There has to be at least one dimension, otherwise it's something different. I'd rather have a consistent approach with a 1-element nd array instead of inventing another range type like "SingleValue".

jonblower commented 8 years ago

what does xarray do in this situation?

letmaik commented 8 years ago

Hmm, xarray wraps numpy, and numpy actually supports 0D arrays:

>>> foo = xr.DataArray(np.array(12.4), dims=[], coords={'time': '2002'})
>>> foo
<xarray.DataArray ()>
array(12.4)
Coordinates:
    time     <U4 '2002'

time is an additional coordinate here, not related to the array dimensions.

You would access that single element with foo[()], while foo[0] throws an error, which makes sense. So it's kind of a degenerate nd array.

Interestingly, the ndarray library in javascript automatically wraps a single element into a 1D array, but if you then pick that axis element, then it reduces to a 0D array.

So, ok, too much worrying from my side again. Let's just do it. 0D arrays, without shape and axisNames, or optionally shape=axisNames=[].

jonblower commented 8 years ago

ok sounds good

letmaik commented 8 years ago

Funny, coincidentally my library covjson-reader already supports leaving out axisNames and shape, due to its backwards-compatibility with the old range structure (where rangeAxisOrder was optional in the Domain object). I'll change the spec accordingly now.

letmaik commented 8 years ago

Changed it, the coverage examples of the common domain types spec look much nicer now. Less clumsy.