Open braaannigan opened 7 years ago
In principle, it is trivial to read netCDF data with xarray (e.g. open_mfdataset
). For well formatted, CF compliant netCDF files, no special processing is needed. The files are self describing. I developed xmitgcm to handle the annoying yet widely used (with MITgcm community) mds binary storage format.
The challenge you are alluding to is that MITgcm's netCDF output is tiled, since mnc can't be used with singlecpuio. (This same challenge arises when using mds output with singlecpuio=False
). In this case, one could imagine putting the logic to virtually concatenate the tiles into a single xarray dataset. This would be a valuable contribution. It's a significant amount of work.
But as long as we are discussing a potentially time consuming project, it is worth noting that the best way forward would be to update MITgcm's netCDF output pathway to support singlecpuio, perhaps using parallel netcdf. Then it could finally produce simple, easy-to-process global netCDF files like every other ocean model in the world. I would be very happy to see xmitgcm become obsolete.
Thanks for your reply @rabernat. I've been playing around with some netcdf output for a few days and have to admit that I didn't get how tightly bound xarray is to netcdf. I agree that having a singlecpuio for netcdf would be handy.
It seems to me that the optimum workflow for people who want netcdf output will be to use gluemncbig.py to put them together, as it cuts the number of netcdf files that then need to be opened by a few orders of magnitude for runs with lots of tiles. In that case xmitgcm wouldn't be needed to open the dataset, but perhaps xmitgcm could be where analysis packages are developed? I'm thinking about things like vector calculus that can be applied to dataArray objects regardless of how they were read in. Perhaps that's worth raising as a separate issue though.
I'm thinking about things like vector calculus that can be applied to dataArray objects regardless of how they were read in
This is a topic of intense discussion by the pangeo group. We would love to have such a package that works not only with mitgcm but all (or most) common ocean models. See the discussion here: https://aospy.hackpad.com/Vector-calculus-operations-pangeo-mosaics-oAPE6Rqvcwt
This was the original goal of my xgcm project https://github.com/xgcm/xgcm I branched xmitgcm off from xgcm to focus on the mitgcm specific issues. Now might be the time to circle back and continue developing xgcm.
In the meantime, @lesommer's oocgcm package has similar goals and is much more mature. https://github.com/lesommer/oocgcm
We would really welcome your contributions on this.
@braaannigan I've added a "singlecpuio" to MITgcm. See my pull request: https://github.com/altMITgcm/MITgcm66h/pull/15 Comments and testing welcome. Needs netcdf to be compiled with parallel hdf5. The pulled genmake2
should test for this.
@braaannigan or @jklymak:
can one of you post a small netCDF file that comes out of MITgcm's mnc package? say for the variable UVEL. I would like to examine one, but I'm too lazy to run a new model myself just for this purpose.
I’ve not run mnc very often. I have a quick example setup on my github page.
I'm looking for an actual file, not a setup.
Or, even better, the xarray repr
of a Dataset from one of those files.
I am concerned about the proliferation of many different mutually incompatible formats of MITgcm netCDF files.
On Fri, Jan 12, 2018 at 8:58 PM, Jody Klymak notifications@github.com wrote:
I’ve not run mnc very often. I have a quick example setup on my github page.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xgcm/xmitgcm/issues/27#issuecomment-357401164, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJtmhhWEbbpmKJvABcFBU0w7wZnurks5tKA3GgaJpZM4LawOQ .
Ok I have lots of my incompatible files from the nf90io package. But you don’t need xmitgm to read them particularly.
Ie the single-file nf90io
Does your new package use the same naming conventions as mnc?
Could you post a file here on github? Just drag and drop should work I think.
On Fri, Jan 12, 2018 at 9:09 PM, Jody Klymak notifications@github.com wrote:
Ie the single-file nf90io
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xgcm/xmitgcm/issues/27#issuecomment-357402002, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJhBnTmn1lo6bfaPx1t87Ft9tDVeJks5tKBBjgaJpZM4LawOQ .
Give this a shot. I don't think I've changed anything since I did this.
Note that due to the way diagnostics work, you need to output the 2d and the 3d in two separate files... for now.
This is a PR though against their experimental repository from the summer: https://github.com/altMITgcm/MITgcm66h/pull/15
Here is some quick and dirty code to read mnc tiled data with xarray: https://gist.github.com/rabernat/66a495173748eca7c025e9d55a846595
I think that it would be useful to have the package be able to deal with netcdf output as well as the mds. I thought it would be helpful to set out an overview of what might need to be done before working on the code.
Overall, I would suggest that we modify the existing mds_store.py so that it can handle either type of file output. This should minimise any duplication of code to handle the different formats.
An option to specify the type of output may be required in the input parameters to open_mdsdataset, though auto-detection could also occur. Netcdf outputs allow multiple variables in a single file, so this may need to be accounted for when specifying prefixes. This would then require modifying _get_all_matching_prefixes for the different cases.
Chunks - I imagine an advantage of using xmitgcm would be keeping large datasets in tiled outputs with chunking naturally occuring based on the size of the tiles.
_MDSDataStore would be modified to produce a store from either type of output. _guess_model_dimensions would be modified to allow netcdf metadata to be read. This would also allow diag_levels to be read for data.diagnostics output.
load_from_prefix would also have to be modified to read the netcdf files. A netcdf equivalent of read_mds would also have to be added to the utilities.
Any thoughts?