Closed rabernat closed 5 years ago
There is also the xecco project, https://github.com/xecco/gcmplots, which is in a very similar space.
Ryan - the original motiviation of eccov4-py is to make reading and working with the netcdf output files of the ECCO 'production' releases easier. That was challenging because of the way the netcdf output files were designed. I think we now have all the code we need to do i/o from mds files in compact format and to then convert those arrays between representations as compact format, 5 'facets' and the '13 tiles'.
What I'd like to do next is add your connectivity layer from the xgcm package to make global calculations easier.
Maybe one way forward is to make sure that our different representations of the arrays can be recognized by plotting routines, say whatever you have cooking in gcmplots.
I think we now have all the code we need to do i/o from mds files in compact format and to then convert those arrays between representations as compact format, 5 'facets' and the '13 tiles'.
I'm just pointing out that we also have code that does the same thing in xmitgcm, and have been working on it for several years. The code is here https://github.com/xgcm/xmitgcm/blob/master/xmitgcm/utils.py And the documentation is here: https://xmitgcm.readthedocs.io/en/latest/usage.html#grid-geometries
Myself, @raphaeldussin, and others have been working hard to make sure this code is well documented, well tested, and broadly useful for the mitgcm community. I am wondering why you saw a need to rewrite those functions. Two possibilities:
Btw, sorry if this is coming off the wrong way... I'm really happy you are working on python stuff for ecco and would genuinely really like to collaborate.
The frustration comes from the feeling that in my group are working hard to develop this stuff that we are hoping is broadly useful, but somehow we are failing to connect with others in the community. That makes me a bit sad and makes me feel like we've wasted our time.
may I add that we also have xesmf-based regridding for llc configurations, which combined with the newly implemented routines to convert from faces to compact and write binaries in xmitgcm allow to easily create initial conditions for llc configurations, regrid observational datasets,...
the regridding code (and examples notebooks) live here, until they find their forever home:
https://github.com/raphaeldussin/MITgcm-recipes
also completely open for collaboration. The more the merrier!
Hey all, I think there is no reason not to collaborate. I have become pretty familiar with xmitgcm, and I have recently found that this can work pretty much seamlessly with ECCOv4-py at least for plotting. I think streamlining and reducing redundancy is optimal especially considering all the work that has gone into these repos.
I'll just jump in and say I have started to prepare scripts for the upcoming ECCO Summer School which uses (so far) xmitgcm routines for I/O, ECCOv4-py routines for plotting and general tutorials (this is essentially how I picked up a lot of python tools), and (tbd) xgcm routines for computing standard quantities (e.g. barotropic streamfunction). The main idea is to recreate the ECCO standard analysis, except in a python based jupyter notebook environment, which can be accessed by the students remotely utilizing servers running at TACC - pangeo style... I started making these in a private repo while I'm just working out kinks, but I'm curious if you all think these types of scripts should go in ECCOv4-py or in a separate repo dedicated to performing the standard analysis. The latter was my thought, but perhaps that's too scattered.
While I've been working on these scripts, I've noticed a couple of redundancies and I can open up a couple PRs which utilizes this idea of ECCOv4-py as a sort of plotting utility and wrapper, xmitgcm and xgcm doing the I/O and computation.
Thanks @rabernat for getting this conversation going again! PS https://github.com/xecco/gcmplots is an old repo that a few of us started before we realized how much work had already been done, and it should be deleted.
Hi there,
Yes, thanks Ryan for bringing this up again. I think we have several excellent developments, Ian’s tutorials in python, along with Ryan’s and Raphael’s (+others') xmitgcm contributions. It’d be nice to merge some of these efforts where feasible and useful, especially at this still relatively early stage.
The ECCO summer school provides a great opportunity to make students familiar with these tools, and hence a good timeline/deadline to attempt streamlining some of this. Thanks to Tim who is happy & keen to take some of this on! And of course in consultation with others.
Perhaps worth a telecom with those interested (Tim, Ian, Ryan, Raphael, others…?) to discuss some of this - if needed?
-Patrick
On Mar 21, 2019, at 12:25 AM, Timothy Smith notifications@github.com wrote:
Hey all, I think there is no reason not to collaborate. I have become pretty familiar with xmitgcm, and I have recently found that this can work pretty much seamlessly with ECCOv4-py at least for plotting. I think streamlining and reducing redundancy is optimal especially considering all the work that has gone into these repos.
I'll just jump in and say I have started to prepare scripts for the upcoming ECCO Summer School which uses (so far) xmitgcm routines for I/O, ECCOv4-py routines for plotting and general tutorials (this is essentially how I picked up a lot of python tools), and (tbd) xgcm routines for computing standard quantities (e.g. barotropic streamfunction). The main idea is to recreate the ECCO standard analysis, except in a python based jupyter notebook environment, which can be accessed by the students remotely utilizing servers running at TACC - pangeo style... I started making these in a private repo while I'm just working out kinks, but I'm curious if you all think these types of scripts should go in ECCOv4-py or in a separate repo dedicated to performing the standard analysis. The latter was my thought, but perhaps that's too scattered.
While I've been working on these scripts, I've noticed a couple of redundancies and I can open up a couple PRs which utilizes this idea of ECCOv4-py as a sort of plotting utility and wrapper, xmitgcm and xgcm doing the I/O and computation.
Thanks @rabernat for getting this conversation going again! PS https://github.com/xecco/gcmplots is an old repo that a few of us started before we realized how much work had already been done, and it should be deleted.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
@heimbach +1 I'd be happy to participate in telecom or any other coordination effort.
@rabernat
you weren't aware that xmitgcm could do this, due to poor documentation and outreach on our part you were aware, but decided that ECCOv4-py had unique needs that could not be handled by xmitgcm (if so, it would be great to know what these are)
@rabernat I am not sure that even now after looking at xmitgcm and other packages that I'm fully aware of their capabilities! It is hard to keep track of the different efforts (so many different names and activities!) It's true that the xmitgcm page and example notebooks could use some additional demos (the single example notebook in github (MITgcm_global_budgets.ipynb) doesn't showcase its mds i/o capabilities at all). The html documentation of xmitgcm is certainly more complete but even there many of the flags seem to be undocumented (swap_dims?, shape?). I could not even figure out how to read a single .data file -- for example, the code block below did not work.
ds_llc = xg.open_mdsdataset('/Users/ifenty/incoming/tmp/global_oce_llc90/S.0000000008.data', nx=90, ny=90, nz=50, geometry='llc')
That aside, the last few ECCO releases were comprised of directories of netcdf files, with each variable broken down into one file for each tile (see, for example, ftp://ecco.jpl.nasa.gov/Version4/Release3/nctiles_monthly/THETA/ ). The original purpose of the eccov4-py package was to have routines to read those netcdf files along with the llc grid information (which was also provided as 13 netcdf tile files) and to reconstruct those files into data structures consistent with the llc geometry. (The next release will also be provided as netcdf files, this time CF-compliant).
xmitgcm is excellent for reading mds files and the llc grid info and packaging that output in a useful data structure. One idea is to make sure that the data structures created by xmitgcm when pointing to output of the mitgcm configured for ECCO (say v4r3) end up looking the same as the data structures that are created by reading in the ECCO 'release' netcdf files.
If we want to try to do that I think we have to confront two small issues, the names of the variable indices and the name of 13 arrays into which we reorganize the llc compact file. xmitgcm uses the i_g and j_g indices in ways that don't make too much sense to me. Specifically, in xmitgcm, u variables are given (i_g, j) indices, v variables (i, j_g) indices, and zeta variables (i_g, j_g) indices. I think your idea was by using (i_g, j) for u variables you were indicating that the 'column location' of the u variable is distinct from the 'column location' of the tracer variable with index (i,j) and that it has the same 'row location'. I think we should be more explict and instead use indicies like (i_u, j_u) for u variables; (i_v, j_v) for v variables; and (i_z, j_z) for zeta variables. Mixing i_g with j and i with j_g is confusing because the u variables are not 'at' (i_g, j). My vote is to be super explicit about what kind of variable we have (tracerk, u, v, or zeta).
The second small issue has to do with the name 'face' or 'tile' or 'facet'. I tend to think 'tile' makes more sense than 'face' for the 13 NX x NY arrays into which we put the compact format files. We more often say that the llc grid has 5 'faces' or 'facets' , which itself is just a frenchified version of face. The word 'tile' appears in the caption of figure 8.3 of the mitgcm documentation to describe the 16 NX x NY pieces of an llc-like domain. On the other hand the word 'tile' appears in as the tilexxx.mitgrid files, even though those files are describing 'faces'. Bah, anyway face/tile isn't a hill I want to die on, but my vote is tile.
I really appreciate you taking the time to write such a long and thoughtful response.
It's true that the xmitgcm page and example notebooks could use some additional demos (the single example notebook in github (MITgcm_global_budgets.ipynb) doesn't showcase its mds i/o capabilities at all).
Thanks for this valuable feedback. We will really make an effort to improve the documentation and add more examples in response to the points you raise.
The html documentation of xmitgcm is certainly more complete but even there many of the flags seem to be undocumented (swap_dims?, shape?).
swap_dims
is documented in the API documentation for open_mdsdataset
(https://xmitgcm.readthedocs.io/en/latest/usage.html#reading-mds-data). shape
is not an argument for that function, so I'm not sure what you're referring to there.
(the single example notebook in github (MITgcm_global_budgets.ipynb) doesn't showcase its mds i/o capabilities at all). I could not even figure out how to read a single .data file -- for example, the code block below did not work.
I think a big source of confusion is that the user-facing part of xmitgcm is designed not to read just the individual MDS .data files but the whole directory of MITgcm output files. As stated in the documentation, the first argument data_dir
is a "path to the directory where the mds .data and .meta files are stored."
Did you try running through the quick start and actually downloading the example data and running the commands? If so, you will see that the ./global_oce_latlon
is full of MDS files. The whole point of the package is to read the MDS files all in one go and present a single xarray Dataset for the whole model run, just as if we had a proper netCDF file output.
Similarly, you can see it in action reading MDS files from an ecco LLC90 run here: https://xmitgcm.readthedocs.io/en/latest/usage.html#grid-geometries These examples should be easy to adapt to whatever LLC data you have.
There are also many low level utilities that are used internally by xmitgcm. But I would not recommend using them unless you are trying to do something really customized. It should be easy to read all possible MDS output via the main open_mdsdatset
function, including regular grids, cubed sphere, LLC, and ASTE-type setups.
For your specific example, I would try
data_dir = '/Users/ifenty/incoming/tmp/global_oce_llc90'
# open all variables mitgcm can find - might not work depending on what's there
ds_llc = xmitgcm.open_mdsdataset(data_dir, geometry='llc')
# or be explicit about just opening a specific file and iteration number
ds_S = xmitgcm.open_mdsdataset(data_dir, geometry='llc', iters=[9], prefix=['S'])
xmitgcm uses the i_g and j_g indices in ways that don't make too much sense to me.
I see the point you are making here, and it is definitely worthy of discussion. (A very similar one was raised in https://github.com/xgcm/xgcm/issues/108 in reference to ROMS outputs.) The choice made in xmitgcm about how to index the different c-grid positions was guided by the goal of allowing xarray to correctly apply broadcasting by dimension name (I strongly encourage you to review that documentation if the concept is not familiar). Xarray will let me directly multiply or add together two variables that share a dimension (e.g. 'i_g'). The MITgcm manual (e.g. the section of second order centered scheme) and code itself make it clear that U and THETA are always considered to be at the same j location. When constructing the second order centered flux in the i-direction, we have to interpolate THETA from cell center to cell face along the i dimension (xgcm provides this functionality), but not along the j direction. So, according to xarray's broadcasting rules, U and THETA have the same j dimension. That's why I decided that U should have the dimensions i_g, j
.
MITgcm's own netcdf files from the MNC package follow this convention (although they use different names for the coordinates). So do most other ocean GCMs (e.g. MOM, NEMO). POP makes the strange choice of using the same dimensions names from all variables, regardless of their grid cell location, a horrible choice IMHO. (Some code to fix POP netCDF files.)
I think we should be more explict and instead use indicies like (i_u, j_u) for u variables; (i_v, j_v) for v variables; and (i_z, j_z) for zeta variables. Mixing i_g with j and i with j_g is confusing because the u variables are not 'at' (i_g, j). My vote is to be super explicit about what kind of variable we have (tracerk, u, v, or zeta).
This is what some versions of ROMS do (see https://github.com/xgcm/xgcm/issues/108). It causes unnecessary complications in xarray-based analysis code, and it is currently impossible for xgcm to work with data labeled in this way. In my view, the position is specified uniquely and precisely by the combination of dimension names, not by the individual dimension names themselves. And all of this is distinct from geographical position, which should instead be encoded in the CF coordinates attributes.
I respect your view here, and I recognize that this is not an obvious or trivial issue. My opinions on this are based on about four years experience of doing tons of xarray-based analysis of finite-volume GCM datasets from many different models. If the MITgcm community decides your convention is preferable, we can adopt it here. Effort will be required to adapt xgcm to work with such data. Laying out the pros and cons of different choices is an important discussion.
The second small issue has to do with the name 'face' or 'tile' or 'facet'.
Fortunately this is a lot easier to change. 😄 If one wants a different name, you can just use xarray to do
ds = ds.rename({'face': 'tile'})
My view is that no analysis software should ever assume any specific names for variables. What matters are the metadata (attributes), such as short_name
, units
, etc., which describe the actual contents. This is a principle followed by xgcm
@rabernat
The MITgcm manual (e.g. the section of second order centered scheme) and code itself make it clear that U and THETA are always considered to be at the same j location. When constructing the second order centered flux in the i-direction, we have to interpolate THETA from cell center to cell face along the i dimension (xgcm provides this functionality), but not along the j direction.
I am not sure I agree with your assessment about how the MITgcm manual and the code deal with indices. As far as I can tell, the model uses integer i,j indices exclusively while the documentation uses integer and half integer i,j values, such as when defining the delta notation (𝛿𝑖
) [doc section 2.1 https://mitgcm.readthedocs.io/en/latest/algorithm/algorithm.html#notation ]. U and THETA are not "at the same j location", obviously, the jth U is to the west of the jth THETA. I think the use of extra labels on i
and j
to differentiate u
, v
, and zeta
points from tracer points makes very good sense. It seems to me to be an open question what is the cleanest way of doing that.
So, according to xarray's broadcasting rules, U and THETA have the same j dimension. That's why I decided that U should have the dimensions
i_g, j
.
They have the same dimension, of course, because there is, for example, one U and one V for every THETA. I think the real issue is, and I appreciate the discussion, is how to label properly for broadcasting?
When constructing the second order centered flux in the i-direction, we have to interpolate THETA from cell center to cell face along the i dimension (xgcm provides this functionality), but not along the j direction.
Agreed, when calculating centered second order fluxes we interpolate THETA in the i-direction and not the j direction, but don't we still have to label the i
index of the interpolated THETA before broadcasting with U? Here's what I mean, with your example dataset:
ds_llc = xg.open_mdsdataset('/Users/ifenty/incoming/tmp/global_oce_llc90/', iters=[8], geometry='llc')
a=ds_llc.S.isel(i=range(0,89), face=1, k=1)
b=ds_llc.S.isel(i=range(1,90), face=1, k=1)
m = (a.values + b.values) / 2
Of course m
now doesn't have any labelled dimensions. Before one can multiply this m
ndarray with U
via broadcasting musn't one assign a new labels to m
array (e.g., i_g, j
)? If one must label the index along the i
direction anyway then why not also assign the index in the j
direction also (say, to j_u
) thus fully changing the i,j
tracer variable to an i_u, j_u
average variable for broadcasting? Does xgcm not need to do this kind of label assignment when interpolating tracer points to u
and v
points for gradient calculations?
@rabernat All that said, I can get behind your idea that it is the combination of (i_x, j_x) that indicates the class of c-grid variable. So long as it we are very explicit about it in the documentation and explain our reasoning, then the chances for confusion will be far reduced.
Going forward, eccov4-py will continue using the indexing convention that I carried over from you. That way I can construct netcdf files for ECCO releases that, when read using xarray, will look more or less the same as those read using xmitgcm.
I made an attempt at routines to 'reorient' (I hesitate to say rotate) pairs of u/v variables so that u variables all are approximately aligned zonally and vice versa for v variables. Is this capability also present in another package?
@ifenty - This isn't quite the same as the reorienting routines which you've written, because I think then those velocities would live on cell edges. However, I was able to rewrite the gcmfaces routine calc_UEVNfromUXVY
using xgcm pretty easily. It takes only a few lines once an xgcm grid object has been defined (@rabernat et al have a nice example here in the xgcm documentation for the complicated llc grid)
# first interpolate velocity to cell centers
vel_c = grid.interp_2d_vectors({'X': ds['UVELMASS'], 'Y': ds['VVELMASS']}, boundary='fill')
# Compute East and North components using cos() and sin()
u_east = vel_c['X'] * ds['CS'] - velc['Y'] * ds['SN']
v_north= vel_c['X'] * ds['SN'] + velc['Y'] * ds['CS']
where ds['CS']
and ds['SN']
are the AngleCS and AngleSN fields, defined at cell center. This puts the U/V fields in the "right" orientation e.g. for plotting, where they live on cell centers - i.e. (i,j).
The latitude of the u point is not, in general, the same as the latitude of the tracer point. (Although it can be in many setups.)
But MITgcm does not solve the equations in lat / lon coordinates. It uses orthogonal curvilinear coordinates. In the local orthogonal basis, the u points is at the same j-axis location as the tracer point. That is, in fact, the whole point of the orthogonal coordinate system. So our choice about dimension naming is fundamentally consistent with the model numerics.
U and THETA are not "at the same j location", obviously, the jth U is to the west of the jth THETA.
In local orthogonal coordinates, "east/west" corresponds with the i-axis and "north/south" with the j-axis. What I mean is that U and that are at the same position with respect to the orthogonal j-axis. That's they both share j
as their dimension along that axis.
Hi All,
Did any consensus emerge on i,j,k axis labeling/conventions? Be nice to converge on a common practice.....
Thanks,
Chris
P.S. On (1) (FWIW my current bias (today) is for two separate indices. That allows N, N+1 sizing for staggered points which seems to be easier for human comprehension/reasoning. It creates some headaches for dask, so it depends a bit how much goal is dask performance v more digestible explanations. In MITgcm core we have favored performance, but not clear that same balance makes sense for diagnostic work where it may be more important to have abstractions that seem to be more intuitive to more people. Don't want to put my thumb on any scales too much though!!!
On Fri, Mar 29, 2019 at 9:59 AM Ryan Abernathey notifications@github.com wrote:
The latitude of the u point is not, in general, the same as the latitude of the tracer point. (Although it can be in many setups.)
But MITgcm does not solve the equations in lat / lon coordinates. It uses orthogonal curvilinear coordinates. In the local orthogonal basis, the u points is at the same j-axis location as the tracer point. That is, in fact, the whole point of the orthogonal coordinate system. So our choice about dimension naming is fundamentally consistent with the model numerics.
U and THETA are not "at the same j location", obviously, the jth U is to the west of the jth THETA.
In local orthogonal coordinates, "east/west" corresponds with the i-axis and "north/south" with the j-axis. What I mean is that U and that are at the same position with respect to the orthogonal j-axis. That's they both share j as their dimension along that axis.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Did any consensus emerge on i,j,k axis labeling/conventions? Be nice to converge on a common practice.....
My impression is that we all agree that any convention could work in principle, as long as we document it clearly. There is no right or wrong, and this is not a value judgement.
Given that situation, I vote for the practical choice of using the xmitgcm convention because:
I'm just a student here, but I agree with @rabernat. Being able to leverage the tools in xgcm is nice because a lot of the heavy lifting has been done in a general way, making a lot of standard routines (e.g. exchanging U/V fields) easy. Also I think this is good for ECCO because users familiar with xgcm from working with other ocean models would have an easier time analyzing and using ECCO output.
@ifenty any thoughts/comments? It could be kind of nice to be able to mix and match tools from various places without having to jump through hoops (or explain to people that this tools does X and this tool does Y)?
@christophernhill I agree with what you say and we will be re-generating the official 'ecco' output using the indexing convention used by xmitgcm.
Perfect - I will follow the same for my bits.
Closing this issue to reduce clutter. We can open new issues of specific details as needed.
Hi ECCOv4-py folks!
I just stumbled upon this repository again. I see from your commit history you are hard at work!
It looks like many of the things you are doing, like reading mds files, converting llc flat binary files to tiles, and extracting data from mitgrid files are essentially identical to things we are doing in xmitgcm (e.g. https://github.com/xgcm/xmitgcm/pull/114).
Are there ways we could be working together, to save everyone precious time and effort?