Open johnomotani opened 4 years ago
If the regions are specified along the dims
'x' and 'y', can we not just promote 'x' and 'y' to be 1D coordinates? Then their values would be preserved upon slicing.
The only disadvantage I can think of that approach would be if you wanted x=0
to be within the domain, and x=-1
to be in the left-hand boundary cells, then you would end up with an x
coord which had a value -1 where it's corresponding dim was index 0...
Yes, naming the coordinates x
, y
and z
would make sense. I think I was worried that that would get confusing when the dimensions are sometimes renamed and sometimes not, e.g. if you rename dimension y
to theta
, what happens to the coordinate y
? Maybe it just needs to be created after the dimensions are renamed...
It would be nice to have the indices count just 'real' grid cells, but in practice I would start the coordinates from 0 in the boundary cells. Otherwise it's implied that the coordinate y
should not count any boundary cells, which would make it double-valued in the upper target boundary cells (if there are any).
I think I was worried that that would get confusing when the dimensions are sometimes renamed
I think this is an argument for not renaming dimensions at all. There isn't a penalty to carrying around multiple coordinates, so I think theta
should be added as a new coordinate, without replacing y
. (I realise that might be different from what I've advised before but if there's a clear need for x
and y
to be coordinates then I think we should always keep them).
There might be one or two functions in xarray which should accept 1D non-dimension coordinates but currently would refuse to, but the neatest solution to that would just be to generalise upstream (I've submitted at least one PR doing that before).
I guess the biggest one would be: can you isel(theta=42)
when theta
is a coordinate but not a dimension?
I guess the biggest one would be: can you isel(theta=42) when theta is a coordinate but not a dimension?
No you can't. That makes some sense though - theta
doesn't have a value of 42
anywhere, it only takes values between 0 and 2pi in any tokamak. To select the 42nd index you conceptually should be selecting along a dimension index, not a coordinate, with isel(y=42)
. (You could imagine changing .sel()
so that it dispatched to .isel()
if passed a dimension, but I expect that would lead to lots of hard-to-debug mistakes in user code).
You should be able to do sel(theta=42)
, but unfortunately you currently can't. This was something I've been meaning to add because it seems pretty crucial!
dimensions with the same names as coordinates seem a bit confusing in general... especially since BOUT++ uses x/y/z both as coordinates (with spacing dx/dy/dz) and often (in the code) as indices. dx, dy and dz are available in the Dataset, so setting y to be a grid-cell-number coordinate is arguably just as confusing as isel(theta=42)
is at the moment... Not sure what a good naming convention is though... (i,j,k) is a bit likely to have name clashes. (ix, iy, iz), (jx, jy, jz), (ix, jy, kz) are possible but a bit obscure. If we put some naming suggestions as individual comments, we can vote on them with reactions. Feel free to add more if you think of them! ...
psi
, theta
, zeta
x
, y
, z
psi_ind
, theta_ind
, zeta_ind
x_ind
, y_ind
, z_ind
i
, j
, k
ix
, iy
, iz
jx
, jy
, jz
ix
, jy
, kz
I'd be happy with any variation on x/y/z_ind
, i/j/k
, ix/iy/iz
or jz/jy/jz
as long as it's documented up front :) I prefer those options as they're a little bit clearer that they're index-space, or at least likely not real space.
Can I veto any variation that uses psi/theta/zeta
though? That heavily implies a particular coordinate system.
kz
makes me think "wavenumber", so -1 for that option :smile:
@ZedThree sorry, was writing the above a bit too late... The 'psi/theta/zeta' options would depend on the geometry chosen by the user. At the moment if you pass geometry='toroidal'
to open_boutdataset()
, when it calls apply_geometry()
it goes through to add_toroidal_geometry_coords()
, which renames the dimensions as x/theta/zeta (x because psi_poloidal is not necessarily a 1d coordinate due to possibly different dx in PFR core). Other geometry implementations can do the same (e.g. s-alpha uses 'r' for the radial coordinate). So psi/theta/zeta should really be 'geometry-choice-dependent, and possibly user-defined, for example psi/theta/zeta', and similarly 'psi_ind/theta_ind/zeta_ind'.
Ah! In that case, definitely the psi_ind
version!
Just came back to look at this issue again. It's what prevents plotting selections of the poloidal plane from working.
The problems discussed above I think are now mostly resolved, since we added bout_tdim
, bout_xdim
, bout_ydim
and bout_zdim
to the metadata to keep track of the dimension names.
Fixing this now needs a bit of extra work to:
ds.metadata["bout_ydim"] == "theta"
we could add a coordinate called theta_global_ind
.xcoord = ds.metadata["bout_xdim"]
result = ds.isel({xcoord: slice(lower, upper)})
to something like
xcoord = f"{ds.metadata['bout_xdim']}_global_ind"
result = ds.sel({xcoord=slice(lower, upper)})
[and we'll have to be careful because isel
excludes the end-point in the normal Python way, but sel
includes the end-point (I guess because it works with arbitrary labels, not just integers and if you had a coordinate that was ("yellow", "green", "blue")
it's weird to need to do .sel(color=slice("yellow", "blue")
to get just "yellow"
and "green"
.]
The regions introduced in #107 are based on slicing the Dataset with global indices. The indices stored in
Region
objects will not be correct if the Dataset is sliced for some reason, e.g.ds = ds.isel(x=slice(20, None)
.One way to resolve this might be to add global index coordinates, e.g. for a dataset with dimensions
{'x', 'theta', 'zeta'}
have some coordinates{'x_ind', 'theta_ind', 'zeta_ind'}
that give global integer indices. Then we could use this inRegion.getSlices()
(which might need an extraDataset or DataArray
argument to get the coordinates from) to get offset. E.g. if the passed-in DataArray hasx_ind
that does not start from0
, can subtract theda[x_ind][0]
fromRegion.xinner_ind
, and check thatRegion.xinner_ind
is within the bounds ofx_ind
.