MITgcm / xmitgcm

Read MITgcm mds binary files into xarray
http://xmitgcm.readthedocs.io
MIT License
56 stars 65 forks source link

Netcdf output #27

Open braaannigan opened 7 years ago

braaannigan commented 7 years ago

I think that it would be useful to have the package be able to deal with netcdf output as well as the mds. I thought it would be helpful to set out an overview of what might need to be done before working on the code.

Overall, I would suggest that we modify the existing mds_store.py so that it can handle either type of file output. This should minimise any duplication of code to handle the different formats.

An option to specify the type of output may be required in the input parameters to open_mdsdataset, though auto-detection could also occur. Netcdf outputs allow multiple variables in a single file, so this may need to be accounted for when specifying prefixes. This would then require modifying _get_all_matching_prefixes for the different cases.

Chunks - I imagine an advantage of using xmitgcm would be keeping large datasets in tiled outputs with chunking naturally occuring based on the size of the tiles.

_MDSDataStore would be modified to produce a store from either type of output. _guess_model_dimensions would be modified to allow netcdf metadata to be read. This would also allow diag_levels to be read for data.diagnostics output.

load_from_prefix would also have to be modified to read the netcdf files. A netcdf equivalent of read_mds would also have to be added to the utilities.

Any thoughts?

rabernat commented 7 years ago

In principle, it is trivial to read netCDF data with xarray (e.g. open_mfdataset). For well formatted, CF compliant netCDF files, no special processing is needed. The files are self describing. I developed xmitgcm to handle the annoying yet widely used (with MITgcm community) mds binary storage format.

The challenge you are alluding to is that MITgcm's netCDF output is tiled, since mnc can't be used with singlecpuio. (This same challenge arises when using mds output with singlecpuio=False). In this case, one could imagine putting the logic to virtually concatenate the tiles into a single xarray dataset. This would be a valuable contribution. It's a significant amount of work.

But as long as we are discussing a potentially time consuming project, it is worth noting that the best way forward would be to update MITgcm's netCDF output pathway to support singlecpuio, perhaps using parallel netcdf. Then it could finally produce simple, easy-to-process global netCDF files like every other ocean model in the world. I would be very happy to see xmitgcm become obsolete.

braaannigan commented 7 years ago

Thanks for your reply @rabernat. I've been playing around with some netcdf output for a few days and have to admit that I didn't get how tightly bound xarray is to netcdf. I agree that having a singlecpuio for netcdf would be handy.

It seems to me that the optimum workflow for people who want netcdf output will be to use gluemncbig.py to put them together, as it cuts the number of netcdf files that then need to be opened by a few orders of magnitude for runs with lots of tiles. In that case xmitgcm wouldn't be needed to open the dataset, but perhaps xmitgcm could be where analysis packages are developed? I'm thinking about things like vector calculus that can be applied to dataArray objects regardless of how they were read in. Perhaps that's worth raising as a separate issue though.

rabernat commented 7 years ago

I'm thinking about things like vector calculus that can be applied to dataArray objects regardless of how they were read in

This is a topic of intense discussion by the pangeo group. We would love to have such a package that works not only with mitgcm but all (or most) common ocean models. See the discussion here: https://aospy.hackpad.com/Vector-calculus-operations-pangeo-mosaics-oAPE6Rqvcwt

This was the original goal of my xgcm project https://github.com/xgcm/xgcm I branched xmitgcm off from xgcm to focus on the mitgcm specific issues. Now might be the time to circle back and continue developing xgcm.

In the meantime, @lesommer's oocgcm package has similar goals and is much more mature. https://github.com/lesommer/oocgcm

We would really welcome your contributions on this.

jklymak commented 7 years ago

@braaannigan I've added a "singlecpuio" to MITgcm. See my pull request: https://github.com/altMITgcm/MITgcm66h/pull/15 Comments and testing welcome. Needs netcdf to be compiled with parallel hdf5. The pulled genmake2 should test for this.

rabernat commented 6 years ago

@braaannigan or @jklymak:

can one of you post a small netCDF file that comes out of MITgcm's mnc package? say for the variable UVEL. I would like to examine one, but I'm too lazy to run a new model myself just for this purpose.

jklymak commented 6 years ago

I’ve not run mnc very often. I have a quick example setup on my github page.

rabernat commented 6 years ago

I'm looking for an actual file, not a setup.

Or, even better, the xarray repr of a Dataset from one of those files.

I am concerned about the proliferation of many different mutually incompatible formats of MITgcm netCDF files.

On Fri, Jan 12, 2018 at 8:58 PM, Jody Klymak notifications@github.com wrote:

I’ve not run mnc very often. I have a quick example setup on my github page.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xgcm/xmitgcm/issues/27#issuecomment-357401164, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJtmhhWEbbpmKJvABcFBU0w7wZnurks5tKA3GgaJpZM4LawOQ .

jklymak commented 6 years ago

Ok I have lots of my incompatible files from the nf90io package. But you don’t need xmitgm to read them particularly.

jklymak commented 6 years ago

Ie the single-file nf90io

rabernat commented 6 years ago

Does your new package use the same naming conventions as mnc?

Could you post a file here on github? Just drag and drop should work I think.

On Fri, Jan 12, 2018 at 9:09 PM, Jody Klymak notifications@github.com wrote:

Ie the single-file nf90io

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xgcm/xmitgcm/issues/27#issuecomment-357402002, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJFJhBnTmn1lo6bfaPx1t87Ft9tDVeJks5tKBBjgaJpZM4LawOQ .

jklymak commented 6 years ago

Give this a shot. I don't think I've changed anything since I did this.

Note that due to the way diagnostics work, you need to output the 2d and the 3d in two separate files... for now.

statevars.zip

This is a PR though against their experimental repository from the summer: https://github.com/altMITgcm/MITgcm66h/pull/15

rabernat commented 6 years ago

Here is some quick and dirty code to read mnc tiled data with xarray: https://gist.github.com/rabernat/66a495173748eca7c025e9d55a846595