MITgcm / xmitgcm

Read MITgcm mds binary files into xarray
http://xmitgcm.readthedocs.io
MIT License
56 stars 65 forks source link

Errors identifying files in run directory #95

Open rabernat opened 6 years ago

rabernat commented 6 years ago

On the MITgcm mailing list, @mjlosch reported some problems parsing the files in his run directory

the diagnostics package produces a file cheapaml at iteration 11 and iceDiag at 12. I acknowledge that this is hard to catch in a generic reading software. But when I remove the cheapaml (and iceDiag) files and retry I still get:

OSError: Could not find the expected file prefixes ['W', 'Eta', 'ice_Tice1', 'ice_iceH', 'ice_Qice2', 'ice_snowH', 'S', 'ice_Tsrf', 'ice_Qice1', 'T', 'ice_snowAge', 'ice_fract', 'V', 'VICE', 'U', 'ice_Tice2', 'UICE'] at iternum 12. (Instead found ['ice_Tsrf', 'ice_Qice1', 'T', 'V', 'ice_Tice2', 'Eta', 'Qnet', 'ice_frwAtm', 'Qsw', 'EmPmR', 'ice_flxAtm', 'W', 'PH', 'ice_Tice1', 'ice_Qice2', 'FU', 'ice_snowH', 'VICE', 'U', 'PHL', 'ice_iceH', 'S', 'FV', 'ice_snowAge', 'ice_fract', 'UICE'])

To me this looks like the same list of files but in different order. ignore_unknown_vars=True does not work.

He then elaborated

cd MITgcm/verification
./testreport -t global_ocean.cs32x15
cd global_ocean.cs32x15/tr_run.icedyn
ipython —pylab
import xmitgcm as xm
d = xm.open_mdsdataset('./‘,geometry='curvilinear')

gives

OSError: Could not find the expected file prefixes ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 
'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', 'Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T'] at iternum 72010. (Instead found ['ice_iceH', 'ice_Tice2', 'U', 'ice_fract', 'W', 'ice_flxAtm', 'PHL', 'ice_Qice2', 'ice_frwAtm', 'V', 'ice_Tsrf', 'Eta', 'ice_Tice1', 'ice_snowH', 'ice_Qice1', 'S', 'ice_snowAge', 'T', 'PH'])
import glob
vlist = ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', 'Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T’]
for v in vlist: print(glob.glob(v+'.*.meta’))
['ice_Tice2.0000072010.meta', 'ice_Tice2.0000072000.meta']
['W.0000072010.meta', 'W.0000072000.meta']
['ice_fract.0000072000.meta', 'ice_fract.0000072010.meta']
['U.0000072010.meta', 'U.0000072000.meta']
['ice_snowH.0000072000.meta', 'ice_snowH.0000072010.meta']
['ice_Qice2.0000072010.meta', 'ice_Qice2.0000072000.meta']
['ice_Tsrf.0000072010.meta', 'ice_Tsrf.0000072000.meta']
['ice_Qice1.0000072010.meta', 'ice_Qice1.0000072000.meta']
['Eta.0000072010.meta', 'Eta.0000072000.meta']
['ice_Tice1.0000072010.meta', 'ice_Tice1.0000072000.meta']
['V.0000072010.meta', 'V.0000072000.meta']
['ice_iceH.0000072010.meta', 'ice_iceH.0000072000.meta']
['S.0000072010.meta', 'S.0000072000.meta']
['ice_snowAge.0000072010.meta', 'ice_snowAge.0000072000.meta']
['T.0000072010.meta', 'T.0000072000.meta’]

so basically all variables have the same “frequency” (a record a 72000 and at 72010), still it does not work. I tried this:

 d = xm.open_mdsdataset('./',prefix = ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', ‘Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T’],geometry='curvilinear')

gives

KeyError: "Couln't find metadata for variable ice_Tice2 and `ignore_unknown_vars`==False.” 

but the meta files are all there (see above)

d = xm.open_mdsdataset('./',prefix = ['ice_Tice2', 'W', 'ice_fract', 'U', 'ice_snowH', 'ice_Qice2', 'ice_Tsrf', 'ice_Qice1', 'Eta', 'ice_Tice1', 'V', 'ice_iceH', 'S', 'ice_snowAge', 'T'],ignore_unknown_vars=True,geometry='curvilinear')

works, but only U,V,W,T,S,Eta are imported and all variables starting with ‘ice_’ are apparently unknown. Why is this so?

rabernat commented 6 years ago

To me this looks like the same list of files but in different order.

The order should not matter, since the two lists are converted to sets before comparison https://github.com/xgcm/xmitgcm/blob/13352a50c3a28c2fb036728a606ff2806f4bd139/xmitgcm/mds_store.py#L159-L164

We will have to get to the bottom of this...

rabernat commented 6 years ago

all variables starting with ‘ice_’ are apparently unknown. Why is this so?

Are you using the diagnostics package to create these files? Or are they part of the "native" seaice output?

mjlosch commented 6 years ago

The variables starting with "ice_" are "native" thsice variables. I choose the verification experiment global_ocean.cs32x15/tr_run.icedyn, because there is no output from the diagnostics package convoluting the file (as opposed to my first example of verification_other/offline_cheapaml)

rabernat commented 6 years ago

Ok, so that is a standalone issue: xmitgcm cannot read the ice_* variables because it doesn't know what they contain. The metadata has to be added manually, as done here for the KPP native output: https://github.com/xgcm/xmitgcm/blob/master/xmitgcm/variables.py#L404 We would love to have a pull request from you that adds the necessary metadata.

We generally prefer the diagnostic output because it is "self describing"; xmitgcm parses available_diagnostics.log to determine everything it needs to know. This is clearly preferably to manually keeping track of the metadata within xmitgcm itself. But we want both possibilities to be supported.

Whatever is happening with the original error you described is trickier. I still don't understand it. Maybe @raphaeldussin can dig in when he has time.

mjlosch commented 6 years ago

Hi Ryan, so it's not really a bug, but a "feature", i.e. there is a list of "known" variables and most packages are not represented in this list. I could start adding "native" output variables to the list (e.g. of the seaice and thsice packages), but eventually this list would become as long or longer than the OrderedDict of state_variables. Is that what you want?

rabernat commented 6 years ago

Is that what you want?

Thanks for providing the seaice stace variables in #96. That was a lot of work on your part! It definitely doesn't hurt to have this info in there. The only downside is that we / you are now responsible for maintaining it if it changes. For this reason, it is preferable to work with the diagnostics output.

I am still eager to resolve the original error related to the inconsistent parsing of the "expected file prefixes".