Open rafaqz opened 3 years ago
That would be great! Probably the easiest first step is getting ArrayInterface.dimnames
to return a Tuple{Vararg{StaticSymbol}}
.
That should be a one-liner ;)
That should be a one-liner ;)
That's largely the goal with interoperability here. There will always be more that package authors can do to take advantage of ArrayInterface.jl
, but there's no reason that use of the basic traits here should be any more complicated that getting dimension names.
Ok, getting my head around the traits here. Most of these seem useful and possible to integrate in DimensionalData.jl. But I have some thoughts and questions!
pixel_spacing
but the contents of a DimArray
and other array types is only sometimes pixels. Should we generalise the name, something like axis_step
maybe?Intervals
as well as Points
, which may complicate methods like spatial_offset
, duration
and time_end
as e.g time_end
might be interpreted as the last time in the axis for points, but the end of the interval that starts there for intervals.onset
be time_start
for symmetry with time_end
? it took me ages to find onset
and realise it was time_start
!time_axis
and spatial_axes
returning?times
and spatial_keys
returning? if time_axis
is a range they will be just the indices for the range?It seems the interface is designed to work on the array type (as opposed to the object)? Is that correct? This should be fine, but will need some extra methods in DD.
That's that basic idea, but the focus is really on the traits here. I've definitely considered other traits that are less focused on cartesian grids, but I figured we should start with the obvious. What other objects did you have in mind?
There are methods like
pixel_spacing
but the contents of a DimArray and other array types is only sometimes pixels. Should we generalise the name, something likeaxis_step
maybe?
That's a good point. What pixel_spacing
does right now could also be done with something like map(step, map(keys, axes(x, is_spatial)))
. So what we really want is the step size in the keys of spatial indices. I think it makes sense to have a specific name for the spatial and temporal version of these functions so that they could be used outside of named dimensions (e.g., geometric objects). I'm not sure what the best name is for that. Maybe spatial_units
?
The spacing/time-step in DD may be fixed or variable along the axis. Should we just error if there is no fixed spacing? But generally it can't be assumed that a spatial or temporal dimension has a fixed step size. This may need a trait.
I've gone back and forth on this one a lot. Even if you have a fixed step size, subsets of that array can easily disrupt this. I think the default is to just return 1
, but that has problems too. I'd be interested in what @timholy thinks about this because it can affect JuliaImages.
DD has the concept of Intervals as well as Points, which may complicate methods like spatial_offset, duration and time_end as e.g time_end might be interpreted as the last time in the axis for points, but the end of the interval that starts there for intervals.
Could you give a more concrete example of this. I think I understand but I want to make sure that I actually address this before I start spouting off ideas.
What happens to non-spatial/temporal dims? what if one dim is categorical?
The design is very similar to ImageCore.jl and ImageAxes.jl in its current form. Basically everything is considered "spatial" except for time dimensions. The difference here is that we have this line and you can explicitly say a dimension name is not spatial. I had a macro where you could do @not_spatial name
at one point, but I feel like there needs to be a discussion before implementing something like that that could potentially create invalidations.
should onset be time_start for symmetry with time_end? it took me ages to find onset and realise it was time_start!
I'd be open to that. I've also considered first_time
and last_time
.
- what kind of object is time_axis and spatial_axes returning?
- what are times and spatial_keys returning? if time_axis is a range they will be just the indices for the range?
time_axis
and times
where supposed to be similar to axes(x, :time)
and keys(axes(x, :time))
, respectively. I think it might make more sense to just get rid of these and just call these directly under the hood, because you should be able to get a type stable return of these using axes(x, is_time)
.
What other objects did you have in mind?
I just meant dispatch on the array as constructed object ::SomeArray
rather than ::Type{SomeArray}
. But it's not important, type based traits are ok.
Maybe
spatial_units
I think the spatial/temporal difference is fine, but I would hesitate from using _units
in the method for the step size. Units and the step size are often separate fields, and units may not be applied in practice (e.g. Unitful.jl conversion does not yet exist for a lot of string format units). Also falling back to 1 will lead to mistakes with unevenly spaced axis index, I think an error is better?
Could you give a more concrete example of this. (Intervals/Points difference)
Spatial data such as geotiff can distinguish between the pixels representing the mean over an interval or the value at a single point. One consequence of this is a DimensionalData.jl Selector
like Between
can give different answers for points or intervals. Points are seen to be between two values if they simply fall between them. Intervals are between two values when they also start and end between the values. If this distinction isn't made, the answer for intervals may be different depending on whether the start or center of the interval is stored, even though the interval they represent is meant to be the same. Both options are used in various formats.
Another example for both the last points is that netcdf even has a format where the start and end of every interval is specified explicitly in a 2 row matrix, that may or may not be regularly spaced and contiguous.
It's pretty common to have the same issue in brain imaging where an index could refer to any of the 8 corners, the center, or the entirety of a voxel. This isn't an issue if you have a common coordinate system though. Maybe coordinate space traits would be helpful? I'm not really sure if there's much you can do for irregularly spacing, except through an error when trying to get something like the step size.
Interesting you have the same issue. DD has traits to handle that - Locus of Start, Center and End for each axis takes care of any combination.
This isn't an issue if you have a common coordinate system though
Can you explain this a little more? How would this solve the Between
problem? You mean converting the index first?
I've tried not to convert to common formats eagerly in DD as users may want to save exact copies of e.g tiff or netcdf without conversion error of the coordinates, and also so that everything is lazy as it may not even be used.
I'm going to use ( | | )
to represent a single key to an index where the real world reference are positions 1-9
start: ( 1 | | ), ( 4 | | ), ( 7 | | )
center: ( | 2 | ), ( | 5 | ), ( | 8 | )
end: ( | | 3 ), ( | | 6 ), ( | | 9 )
The step size and length are the same for each of these but the initial offsets are 0, 1, and 2 (respectively).
If you want the data at point 2 you can only really get that for the centered position.
If you want the data where the key is <=(5)
then you would get the indices 1:2, 1:1, and 1:1.
And if the had the same keys but different pixel/voxel locations such as ...
start: ( 1 | | ), ( 2 | | ), ( 3 | | )
center: ( | 1 | ), ( | 2 | ), ( | 3 | )
end: ( | | 1 ), ( | | 2 ), ( | | 3 )
The locations represented would actually be 1:5, 0:4, and -1:3.
If I understand the problem you're describing correctly, then the issue is that if you wanted to average all positions at the real world location 1
then you would have to account for these different reference systems.
In neuroscience we're at the mercy of whoever has created the machine that collects the data and the technician that operates it. Before we do anything where spatiotemporal data from different modalities, hardware, or acquisition protocols interacts in any way we have a template registration step. Encoding whether each position represents any of the eight corners or center of a voxel doesn't actually help us at all because everybody's anatomy is different. Every brain image or electrode array needs to be registered to a common template space before they interact in any way. If we want to look at population averaged information in an individuals brain space, we have to register that population average to the individual's brain. Once that's done most issues are taken care of. The only situation registration doesn't resolve this problem is when constructing a mesh with values (e.g., doing marching cubes), but I'm pretty sure the actual values would have to be interpolated no matter the key position in the voxel.
Yep, pretty much. In spatial data those points are fixed e.g. lat/lon on the earths surface, as defined in some coordinate system. As it's always the same earth, not different brains, they need to be accurately comparable to a file in a different format that may use a different start/center/end position. This might amount to e.g. many kilometers of error, and may lead to selecting array subsets of different sizes/spatial extents with the same command, depending on the file format.
Another thing I hadn't got my head around is your examples imply that axes
returns the objects that hold the axis keys? DD doesn't do that, it forwards base methods like axes
to the parent array as it often makes sense to work with them as regular arrays. dims
gets the tuple of Dimension
objects that hold the axis keys. The interface probably needs another method that could fall back to axes
.
Sorry for the severe delay in response.
This might amount to e.g. many kilometers of error, and may lead to selecting array subsets of different sizes/spatial extents with the same command, depending on the file format.
To be clear I don't think this isn't an issue. I'm just explaining why my intuition is telling me this sort of thing isn't a dimension-wise trait and is more of a global coordinate space specification. I still need to think about that sort of thing though. I'm open to different solutions if there is a clear way to integrate it here though.
Another thing I hadn't got my head around is your examples imply that axes returns the objects that hold the axis keys?
This actually originated from a conversation with @mcabbott. I don't have access to that conversation anymore, but keys
seemed like the best name for what we were referring to. I can go into more detail about this if you'd like. If you have a very specific set of types that are optimized for your workflow then it might make more sense to overload some of the methods for those types. If what we have here isn't generic enough to easily facilitate it then we do need to change the code here.
Ok no problem, we can keep this conversation going as points/intervals questions come up.
And I agree keys
is a good method. Maybe I'll move to using it in DD. I was mostly referring to the fact that axes
in DD just returns the regular array axes and wont work in your examples, while dims
actually gets the dimension objects that keys
can be applied to.
AxisArrays.jl has its own axes
method, but I find that pretty confusing given what Base.axes
means.
I haven't read this closely, but on terminology: I think AxisArrays.axes(A)
predates Base's use of the term (IIRC indices(A)
used to be this tuple of OneTo
s) but it's now pretty confusing. I think I'm to blame for calling the other things (distances, or whatever) "keys", forgetting that Base also uses that as a synonym for indices, more or less. Overloading Base.keys
(on an AbstractArray
) leads to bad things happening. A less confusing term might be "axis labels".
Ok interesting. I've avoided overloading base methods in DimensionalData for roughly these reasons. Its good if spatial/named arrays behave for most general purposes as generic AbstractArray
s. And our additional behaviours just use new functions. (Edit: But I'm doing a lot more with getindex
than it should, although base behavious dont change. I just really like those square brackets ;)
axis_labels
is good, although it could be confused with e.g :X
and :Y
. Unless we had axis_names
for that? Other potential names for keys are axis_values/axisvalues
, axis_lookup/axislookup
and similar.
axis_labels
is good, although it could be confused with e.g:X
and:Y
.
I would argue here that NamedDims has taken the lead, and terms those "dimension names". Base uses dims = 2
to indicate which of 1:ndims(A)
, and NamedTuple
for a thing with Symbols in its type as names, so that seems pretty consistent.
axis_values/axisvalues
My opinion is that value wants to be what's really in the array, like setindex!(array, value, index...)
. Although I'm not certain Base really has this in code anywhere.
But I'm doing a lot more with
getindex
than it should, although base behaviours don't change
It seems less confusing to me to abuse syntax than words. Nobody actually calls getindex(A, X="cat")
, they write A[X="cat"]
so the fact that "cat" is not an index (meaning Int
or CartesianIndex
) doesn't jar, IMO.
Thanks for chiming in @mcabbott. keys
is not used on the array but on the individual axes, so it's not really an issue. The issue with naming/syntax here is specifically related to what the "keys" of the spatial axes should be called. If we end up needing to use something that is very general and applies to all arrays, then it should probably go into ArrayInterface.jl instead of this package.
If you want another test case I'd like to implement this interface in DimensionalData.jl, and eventually use it in DynamicGrids.jl to support more kinds of spatial/temporal arrays.
Is it ready for that? I could start with a branch in DD.