Average and instantaneous files grouped if they have the same time step

MITgcm / xmitgcm

Read MITgcm mds binary files into xarray

http://xmitgcm.readthedocs.io

MIT License

56 stars 65 forks source link

Average and instantaneous files grouped if they have the same time step #325

Closed garrettdreyfus closed 5 months ago

garrettdreyfus commented 1 year ago

Hello!

In using xmitgcm I have found that if instantaneous model output files and average model output files share the same timestep they will both be included in the final DataSet under the same DataArray. E.g. THETA_inst data will be included in the ds.THETA.

I did some debugging and found that this was because in the metadata the fldList value for my instantaneous and average snapshots is the same. So when it is read in parse_meta_file it will return the same "vname". I am unsure if this is a quirk of my specific MITgcm simulations or if it is more general.

The simplest fix it seems to me would be to always set vname equal to prefix in the "load_from_prefix" function which is already done if the filename base doesn't match the prefix. Although I would defer to someone with more know how than me if this would break other things.

Thank you for this library!

Garrett

garrettdreyfus commented 1 year ago

P.S. I tried to make my suggested fix work and it also required duplicating this for loop to add variations of file names with _inst.

timothyas commented 1 year ago

Hi @garrettdreyfus, sorry for a long delay here. Another option is to use the prefix= option to open_mdsdataset, and create two different datasets: one with averaged data and one with snapshots. For instance, if your MITgcm run outputs averaged data with the form avg_flds.*.meta/data and snapshots like snap_flds.*.meta/data then...

from xmitgcm import open_mdsdataset
avg = open_mdsdataset(data_dir, grid_dir, prefix=["avg_flds"], ...)
snp = open_mdsdataset(data_dir, grid_dir, prefix=["snap_flds"], ...)

and then you can work with the two datasets separately. I think this is also nice because it will be clear which type of data you'll be working with in your workflow. I hope that helps.

garrettdreyfus commented 5 months ago

Hey @timothyas,

Sorry for the ridiculously long delay getting back to you! The problem I was actually running into was that I have average and instantaneous fields in the format of "QUANTITY" and "QUANTITY_inst" (e.g. "THETA" and "THETA_inst"). So setting prefix to "THETA" includes "THETA_inst" no matter what I set the prefix field to.

You're response made me realize this may be a problem unique to me because of this file name format choice, so I will close the issue.

Thanks again for your help and patience!

Garrett

timothyas commented 5 months ago

I'm glad to hear you figured that out!