EC-Earth / ece2cmor3

Post-processing and cmorization of ec-earth output
Apache License 2.0
13 stars 6 forks source link

[QA-DKRZ] Complete report with v1.2.0 (historical AOGCM) #497

Closed oloapinivad closed 5 years ago

oloapinivad commented 5 years ago

I have just run the QA-DKRZ on 5 years of simulation from the AOGCM historical. You can find the json report produced here: http://wilma.to.isac.cnr.it/ecearth/diag/CMIP6/chis/infocmor/EC-Earth-Consortium_EC-Earth3_historical_r4i1p1f1.json.old Almost all of them are warnings, and it is unclear to what extent they are limitation of the tool or caveats in our cmorised output.

The only error is reported involves the empty field and is discussed in #471. Also the missing time boundaries in #472 seems to me relevant.

On top of these two, we need to decide which from these warnings we would like to address with the current version, in the future or maybe never: many of them are complaints about the CF data format, and as we discussed with @zklaus, @klauswyser and @uwefladrich they may be not be so relevant. The unused time dimension and the checksum of layout have changed ones seems a bit more worrisome, but please let me know what you think about.

Best, Paolo

zklaus commented 5 years ago

Some comments per annotation:

oloapinivad commented 5 years ago

Thanks a lot for the complete analysis!!! I reply only where I have a bit of competence ;-)

  1. Seems to be a bug in qa-dkrz. CF Section 4.1 clearly states that degrees_north is the correct unit for latitude. @oloapinivad, could you file a bug report with qa-dkrz?

Ok, I will report it.

  1. I think this means that one time step is masked completely. Probably not good. Have a closer look at these variables.
  2. Everything masked. With important variables (psl, ps, tas, ta,...) as well. Not good. Check closely. Note the tables!

At least for 7 this is discussed in #501. It seems to come from the first time step missing. Probably fixing that issue fill fix also 6 since all the variables but sitemptop comes from IFS.

  1. Interesting. @oloapinivad, could you produce the ncdump -h for two files with this problem and attach them here, best in a .txt file?

I tried by myself without spotting anything. The txt is here attached! msftyz.txt

  1. No idea. Maybe a good idea to check shapes of all the files.
  2. See 11.

Checked the file shape without finding any strange behaviour. No clue.

  1. These are flux corrections and could be correct. I don't know if we correct the flux in any way.

Well, these are never used but in DCPP experiments. We don't have any flux correction. They are discussed in #471, I think they should - at least - not be uploaded on the ESGF.

zklaus commented 5 years ago

Thanks a lot for the complete analysis!!!

:beers:

I reply only where I have a bit of competence ;-)

So, everywhere. Good.

  1. Seems to be a bug in qa-dkrz. CF Section 4.1 clearly states that degrees_north is the correct unit for latitude. @oloapinivad, could you file a bug report with qa-dkrz?

Ok, I will report it.

Great, thanks! Do you have an issue number?

At least for 7 this is discussed in #501. It seems to come from the first time step missing. Probably fixing that issue fill fix also 6 since all the variables but sitemptop comes from IFS.

Ok, I put the issue number in the comment above so we can track it.

  1. Interesting. @oloapinivad, could you produce the ncdump -h for two files with this problem and attach them here, best in a .txt file?

I tried by myself without spotting anything. The txt is here attached! msftyz.txt

Hm. I also don't find anything. Could you perhaps also share the pertinent check log? There we will have the individual annotations and not only the summary, allowing us, among other things, to identify the files where this originates.

  1. No idea. Maybe a good idea to check shapes of all the files.
  2. See 11. Checked the file shape without finding any strange behaviour. No clue.

Let's have a look at the check log also for this.

  1. These are flux corrections and could be correct. I don't know if we correct the flux in any way. Well, these are never used but in DCPP experiments. We don't have any flux correction. They are discussed in #471, I think they should - at least - not be uploaded on the ESGF.

Hm, I am not so sure about that. It seems a bit wasteful, true, but it is requested and it seems somebody in the EC-Earth consortium thought it should be provided, so why not? We just need to be aware that this is legitimately all zero.

oloapinivad commented 5 years ago

So I had a crash of conda and reinstall everything from scratch. I re-run the QA-DKRZ and now the cmor error in 5. is gone (which was related to PrePARE when I looked at check-logs) but it seems to have "evolved" in a problem with udunits that is now everywhere in my logs (I have asked for clarification here https://github.com/IS-ENES-Data/QA-DKRZ/issues/14)

In https://github.com/IS-ENES-Data/QA-DKRZ/issues/14 you can find the issue with dimensions discussed in 2.

Attached you can find the check_logs from the most recent QA-DKRZ runs: data have been cmorized with version 1.2.0 and the fix of #501 included. The problems of 10. ,11. and 12. seems to be occurring to all files. EC-Earth-Consortium_EC-Earth3_historical_r4i1p1f1.log

the problem in 7. seems gone after the #501 fix, but we still have some report similar to 6. I visually inspected those files and they seem ok at first glance fine, could it be that QA-DKRZ is complaining about the present of missing values (these fields - at least the most of them - are masked over ocean)?

BTW, I think it would be better to track each warning/error of QA-DKRZ separately since it is hard to keep track of all of them here, considering the evolving reports of QA-DKRZ and versions of ece2cmor (I don't even mention yet the different family of errors I get from the Veg version).

zklaus commented 5 years ago

Hm. The udunits error smell of an installation problem since the unit system is part of the normal udunits installation. Can you try something like the following in an interactive python session?

from cf_units import Unit
print(Unit('K'))
uwefladrich commented 5 years ago

@oloapinivad, check the UDUNITS2_XML_PATH path in your ~/.qa-dkrz/config.txt, I had this misconfigured at some point and saw the same error. Note that the file can have multiple sections, particularly if you re-installed qa-dkrz, make sure the section that is actually used in your case has the correct values.

zklaus commented 5 years ago

I think the issue about cell_methods is also a qa-dkrz bug; I reported it as IS-ENES-Data/QA-DKRZ#15.

uwefladrich commented 5 years ago

w.r.t.

  1. Solved by deactivating CF_732a.

This can be done by editing ~/.qa-dkrz/QA_TABLES/tables/projects/CF/CF_check-list.conf. Either comment out:

# Attribute cell_methods should have key-word <comment:> omitted, if there is \
# no standardised information & CF_732a

or deactivate it like this:

Attribute cell_methods should have key-word <comment:> omitted, if there is \
no standardised information & CF_732a,D

I'll test if we can put all our ignore's in one separate config file.

oloapinivad commented 5 years ago

@uwefladrich thanks a lot!!! I didn't have any UDUNITS2_XML_PATH in the ~/.qa-dkrz/config.txt but adding it now PrePARE is no longer failing, the udunits error is gone!

zklaus commented 5 years ago

@oloapinivad, nice! That might influence some of the other errors, namely 3. Could you update us with a fresh round of logs once it is finished?

oloapinivad commented 5 years ago

I was about to post it!

http://wilma.to.isac.cnr.it/ecearth/diag/CMIP6/chis/infocmor/EC-Earth-Consortium_EC-Earth3_historical_r4i1p1f1.json

Actually it seems that both 2 and 3 are now gone! So the degrees_north was connected to a failure in udunits? Of course 5 and 7 are gone too.

oloapinivad commented 5 years ago

BTW now PrePARE is working fine so we have the wap and zg failure reported in #462

zklaus commented 5 years ago

I think it tried to parse the units string with udunits and then perform a check like ut_are_convertible. But because it didn't load the unit system this didn't work and the correct units couldn't be compared successfully with the found units.

zklaus commented 5 years ago

One comment on the checksum errors (11 and 14): The problem might be related to the fact that some variables in different tables are reported on different sets of pressure levels with the coordinate variable always being called plev. This is organized in the tables by using different dimensions in the variable_entries like plev19, plev7h, ... These can be found in CMIP6_coordinates.json where they all have out_name plev. The problem for me is that I don't have files where this is occuring and that the error message is rather unclear (which is reported as IS-ENES-Data/QA-DKRZ#18).

oloapinivad commented 5 years ago

@zklaus here a couple of ncdump output from a ua file from day and Eday table. The number of vertical levels is indeed different. The full files are quite heavy, but I can provide them if you need them.

< netcdf ua_Eday_EC-Earth3_historical_r4i1p1f1_gr_18510101-18511231 {
---
> netcdf ua_day_EC-Earth3_historical_r4i1p1f1_gr_18510101-18511231 {
4c4
<   plev = 19 ;
---
>   plev = 8 ;
44c44
<       ua:history = "2019-06-26T09:19:14Z altered by CMOR: Reordered dimensions, original order: lat lon plev time." ;
---
>       ua:history = "2019-06-26T09:26:28Z altered by CMOR: Reordered dimensions, original order: lat lon plev time." ;
56c56
<       :creation_date = "2019-06-26T09:19:38Z" ;
---
>       :creation_date = "2019-06-26T09:26:44Z" ;
66c66
<       :history = "2019-06-26T09:19:38Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards.;\n",
---
>       :history = "2019-06-26T09:16:16Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards.;\n",
97c97
<       :table_id = "Eday" ;
---
>       :table_id = "day" ;
99a100
>       :tracking_id = "hdl:21.14100/f87be62c-da3a-4e21-a11c-d7d7dbd8d33c" ;
104d104
<       :tracking_id = "hdl:21.14100/a1012c43-b60b-47f9-9b8c-9ac44880ca6e" ;

ua_Eday_EC-Earth3_historical_r4i1p1f1_gr_18510101-18511231.txt ua_day_EC-Earth3_historical_r4i1p1f1_gr_18510101-18511231.txt

zklaus commented 5 years ago

@h-dh provided some clarification in IS-ENES-Data/QA-DKRZ#18. @oloapinivad, maybe you can read that and have a look at your results folder?

oloapinivad commented 5 years ago

So following the discussion https://github.com/IS-ENES-Data/QA-DKRZ/issues/18 we can say that 11 and 14 are not problem in our data but rather a tuning of QA-DKRZ.

To avoid the warnings we could set the QA-DKRZ with something like PT_PATH_INDEX=2,3,6 so that it would be the same for all the experiments and ensemble members but it will distinguish on the tables. Do you agree?

zklaus commented 5 years ago

Probably yes, but let's wait just a bit longer to see what precipitates over there.

uwefladrich commented 5 years ago
  1. IS-ENES-Data/QA-DKRZ#17 As discussed in a different issue I think this is ok since these are instantaneous variables.

Number 9 hasn't shown up with a recent version of qa-dkrz from github. Problem solved?

zklaus commented 5 years ago

My suggestion for PT_PATH_INDEX is as follows

#CMIP[1]/EC-Earth-Consortium[2]/EC-Earth3-Veg[3]/historical[4]/r1i1p1f1[5]/Omon[6]/volo[7]/gn[8]/v20190605[9]
PT_PATH_INDEX=2,3,6,7,8

Rationale:

Does that make sense? Do you agree?

oloapinivad commented 5 years ago

I totally agree on 2,3,6. Do we have any experience of cases for which 7 and 8 are needed?

As long as I understand the distinction is made on frequencies and variables name anyhow, so probably 7 is redundant (we should have encountered the issue already I think since now we have 2,3,4 only). 8 I agree is still possibile, but I don't know if it is ever happening. However, we can keep it to stay on the safe side.

zklaus commented 5 years ago

Wrt T_9, this had been addressed already in 2018. Not sure why the error showed up here at all. Maybe something with the detection of the instantaneous property went wrong?

@oloapinivad, if you can confirm that this error isn't occurring for you any longer, I think we can close the upstream issue.

oloapinivad commented 5 years ago

Wrt T_9, this had been addressed already in 2018. Not sure why the error showed up here at all. Maybe something with the detection of the instantaneous property went wrong?

@oloapinivad, if you can confirm that this error isn't occurring for you any longer, I think we can close the upstream issue.

I have the conda version qa-dkrz-0.6.7-59,master-5fdfa30 and I have T_9 there. So probably it will be solved in a new release since in the Github is gone.

uwefladrich commented 5 years ago

I have the conda version qa-dkrz-0.6.7-59,master-5fdfa30 and I have T_9 there. So probably it will be solved in a new release since in the Github is gone.

I can confirm that I had the error with the conda version (as mentioned above) and that it disappeared when switching to the github install.

Btw. I am putting together a short install-from-github howto, coming soon.

tommibergman commented 5 years ago

I totally agree on 2,3,6. Do we have any experience of cases for which 7 and 8 are needed?

As long as I understand the distinction is made on frequencies and variables name anyhow, so probably 7 is redundant (we should have encountered the issue already I think since now we have 2,3,4 only). 8 I agree is still possibile, but I don't know if it is ever happening. However, we can keep it to stay on the safe side.

Just to comment that we have variables if IFS and TM5 which are on a different grid, but not sure if the case 8 applies since they are also on different tables. But that might be one instance, where it matters.

zklaus commented 5 years ago

I forked the repository and added tags where the conda file conda-recipes/qa-dkrz/meta.yaml changes, which is the way that the tags that are present in upstream have been handled and which agrees with the hash as given in @oloapinivad's version output. You can find it at https://github.com/zklaus/QA-DKRZ

One thing to note is that there has been a change to the detection of instantaneous by means of "...Pt" frequencies, which is likely responsible for the absence of T_9 and could also impact the checksum stuff since perhaps the frequency detection didn't work properly before.

In light of this and seeing as this issue is getting really long I propose that @oloapinivad also moves to the github version (following @uwefladrich's howto) and then we close this issue and move forward with a fresh one.

To summarize what we have learned so far:

Everything else has either been resolved now or should resurface with the github version. I would also like to use the github version to determine the correct PT_PATH_INDEX because the changed frequency detection could impact this.

Ok?

oloapinivad commented 5 years ago

So I installed the github version using the Uwe's to do list (even if with some difficulties) and run a fresh QA-DKRZ with CHECK_MODE=TIME,DATA,CNSTY,CF,DRS,DRS_F,DRS_P that produces a new report. Of course CV is still missing.

Annotations: http://wilma.to.isac.cnr.it/ecearth/diag/CMIP6/chis/infocmor/EC-Earth-Consortium_EC-Earth3_historical_r4i1p1f1.json Logfile: http://wilma.to.isac.cnr.it/ecearth/diag/CMIP6/chis/infocmor/EC-Earth-Consortium_EC-Earth3_historical_r4i1p1f1.log

I have a considerably reduced set of errors (T_9 is gone), but a few new ones. :-) We can go for a new issue I think...

@zkaus, actually your version is the same as the one h-dh so I am wondering if you pushed your commit.

A note on QA-DKRZ speed: it took 50 minutes with 18 cores and 5 years (i.e. about 1350 files)

zklaus commented 5 years ago

@zkaus, actually your version is the same as the one h-dh so I am wondering if you pushed your commit.

I didn't actually add any commits. I merely added tags to existing commits. So if you add my repo as a remote

git remote add zklaus https://github.com/zklaus/QA-DKRZ

and fetch everything, including tags

git fetch --all --tags

you should find the new tags

[a002160@c21236 QA-DKRZ]$ git tag
v0.6.5-31
v0.6.5-32
v0.6.5-34
v0.6.5-35
v0.6.5-36
v0.6.5-37
v0.6.7-30
v0.6.7-31
v0.6.7-49
v0.6.7-50
v0.6.7-51
v0.6.7-53
v0.6.7-54
v0.6.7-55
v0.6.7-56
v0.6.7-58
v0.6.7-59
v0.6.7-60

that will also show up as yellow labels in gitk.

zklaus commented 5 years ago

I now opened the new issue #504 to track further progress.