cemac / CFizer

Tools to make netCDF files CF-compliant (Climate and Forecast metadata convention), initially working with MONC (Met Office NERC Cloud model) output.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

CFizer output not passing cfchecker tool #58

Closed AnneBarber1 closed 2 months ago

AnneBarber1 commented 5 months ago

@sjboeing has asked me to confirm that CFizer output passes the CF-checker. I have tested the tool on jasmin-sci1, and get the following error:

(cfchecker2) [abarber@sci1 ~]$ cfchecks d20200128_diagnostic_3d_270000_270000.nc
CHECKING NetCDF FILE: d20200128_diagnostic_3d_270000_270000.nc
=====================
Using CF Checker Version 4.0.0
Checking against CF Version CF-1.7
Using Standard Name Table Version 84 (2024-01-19T15:55:10Z)
Using Area Type Table Version 11 (06 July 2023)
Using Standardized Region Name Table Version 4 (18 December 2018)

ERROR: (2.6.1): This netCDF file does not appear to contain CF Convention data.

------------------
Checking variable: time
------------------
Traceback (most recent call last):
  File "/home/users/abarber/.conda/envs/cfchecker2/bin/cfchecks", line 10, in <module>
    sys.exit(main())
  File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 3097, in main
    inst.checker(file)
  File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 532, in checker
    return self._checker()
  File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 773, in _checker
    self.chkUnits(var,allCoordVars)
  File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 2317, in chkUnits
    if not varUnit.isvalid:
AttributeError: 'Units' object has no attribute 'isvalid'

Steef I wonder if this may be related to the conversation you forwarded me with Domantas Dilys. The variable ‘time’ has no associated dimension attribute, with an ‘ncdump -h’ on the file showing this:

(cfchecker2) [abarber@sci1 ~]$ ncdump -h d20200128_diagnostic_3d_270000_270000.nc
netcdf d20200128_diagnostic_3d_270000_270000 {
dimensions:
        x = 1280 ;
        y = 1280 ;
        z = 121 ;
        yv = 1280 ;
        zn = 121 ;
        xu = 1280 ;
        number_options = 664 ;
        kvp = 2 ;
        string150 = 150 ;
variables:
        double time ;
                time:standard_name = "time" ;
                time:calendar = "proleptic_gregorian" ;
                time:axis = "T" ;
                time:units = "s since 2020-01-25 00:00:00" ;

Does this sound like a likely culprit to you?

sjboeing commented 5 months ago

Hi Anne,

Domantas used EPIC data, so that is not directly comparable. One issue may be that the data on the CF-convention itself is missing. On some recent files I have processed on ARCHER, that is the only error I get on 3D data: I do get some more warnings though.

Cheers,

Steef


From: Anne Barber @.> Sent: 22 April 2024 11:00 To: cemac/CFizer @.> Cc: Steven Boeing @.>; Mention @.> Subject: [cemac/CFizer] CFizer output not passing cfchecker tool (Issue #58)

CAUTION: External Message. Use caution opening links and attachments.

@sjboeinghttps://github.com/sjboeing has asked me to confirm that CFizer output passes the CF-checkerhttps://github.com/cedadev/cf-checker. I have tested the tool on jasmin-sci1, and get the following error:

(cfchecker2) @.*** ~]$ cfchecks d20200128_diagnostic_3d_270000_270000.nc CHECKING NetCDF FILE: d20200128_diagnostic_3d_270000_270000.nc

Using CF Checker Version 4.0.0 Checking against CF Version CF-1.7 Using Standard Name Table Version 84 (2024-01-19T15:55:10Z) Using Area Type Table Version 11 (06 July 2023) Using Standardized Region Name Table Version 4 (18 December 2018)

ERROR: (2.6.1): This netCDF file does not appear to contain CF Convention data.


Checking variable: time

Traceback (most recent call last): File "/home/users/abarber/.conda/envs/cfchecker2/bin/cfchecks", line 10, in sys.exit(main()) File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 3097, in main inst.checker(file) File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 532, in checker return self._checker() File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 773, in _checker self.chkUnits(var,allCoordVars) File "/home/users/abarber/.conda/envs/cfchecker2/lib/python2.7/site-packages/cfchecker/cfchecks.py", line 2317, in chkUnits if not varUnit.isvalid: AttributeError: 'Units' object has no attribute 'isvalid'

Steef I wonder if this may be related to the conversation you forwarded me with Domantas Dilys. The variable ‘time’ has no associated dimension attribute, with an ‘ncdump -h’ on the file showing this:

(cfchecker2) @.*** ~]$ ncdump -h d20200128_diagnostic_3d_270000_270000.nc netcdf d20200128_diagnostic_3d_270000_270000 { dimensions: x = 1280 ; y = 1280 ; z = 121 ; yv = 1280 ; zn = 121 ; xu = 1280 ; number_options = 664 ; kvp = 2 ; string150 = 150 ; variables: double time ; time:standard_name = "time" ; time:calendar = "proleptic_gregorian" ; time:axis = "T" ; time:units = "s since 2020-01-25 00:00:00" ;

Does this sound like a likely culprit to you?

— Reply to this email directly, view it on GitHubhttps://github.com/cemac/CFizer/issues/58, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQLFKMBEZSFLKWHA4KBUGTY6TNTFAVCNFSM6AAAAABGSOMGJCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TMMJRGEYTENY. You are receiving this because you were mentioned.Message ID: @.***>

AnneBarber1 commented 5 months ago

Hi @sjboeing the ncdump of the file showed 'Conventions = "CF-1.10"' in the global attributes so presumably that is ok?

AnneBarber1 commented 5 months ago

Another thought: the specific error "AttributeError: 'Units' object has no attribute 'isvalid'" suggests that the time unit - "s since 2020-01-25 00:00:00" - is invalid. Supposedly this notation adheres to CF1.10 but I am having difficulty getting the tool to recognise this version, and it is defaulting back to CF1.7, so this could be part of the problem. I'll report back once I've got the cf checker using CF1.7

cemaccam commented 5 months ago

This seems odd, as I ran CF Checker on outputs during my initial testing prior to the first release. I think I did run that against CF1.10, though.

@AnneBarber1 , I don't think it's that the time variable isn't valid; it's the Units object to which it's been assigned by the checker that lacks the attribute. If the time unit wasn't valid, the isvalid attribute would simply be False.

You may well be onto something with time not being in the dimensions. In all the test files I had from MONC outputs, it was. One way you could check for this being the problem would be to use xarray to make time a dimension in the NC file in question; you could do this at Python command line or in Jupyter notebook.

If this is the problem, a check could be added to CFizer, looking for variables with an axis attribute and ensuring they have a corresponding dimension.

Another possible source of the error would be a version conflict in the dependencies. CF-checker requires cfunits, but doesn't specify a version, so there's a risk that the version installed doesn't have the Units.isvalid attribute, although that seems unlikely.

AnneBarber1 commented 5 months ago

Hi @cemaccam thanks for getting back to us. I'm curious how you managed to get cf-checker using version 1.10? The online version only shows available versions up to 1.8, and the command-line tool doesn't seem to recognise when I use the -v argument to specify the version, always defaulting back to v1.7:

image

The missing time dimension is something that has also been recommended by someone on the VAPOR software forum, as I have been unable to view these MONC files using this software.

I've used xarray to add time as a dimension, and cf-checker then gave me a slightly different error:

Checking variable: time WARN: (3): No standard_name or long_name attribute specified

I noticed that the time variable was missing a few attributes so I added these back in using xarray. Below is a screenshot of the dimensions/variables in my modified file, which look as expected:

image

Unfortunately this doesn't seem to have helped as the cf-checker goes back to the original error at the start of this thread:

image

I'm away on a training course for the bulk of next week but I will continue to look into this when I have time. Thought I would give you both an update in case you can see anything obvious I'm missing.

cemaccam commented 5 months ago

Ah, my fuzzy memory, then. I see CF Checker on GitHub is also only to CF version 1.08. I probably just ran it with default settings.

I probably won't have time to get back to it this week, but maybe we can find a time to talk about the files you're processing, as all the MONC test files I had included time as both a variable and dimension.

On Sat, 27 Apr 2024, 01:17 Anne Barber, @.***> wrote:

Hi @cemaccam https://github.com/cemaccam thanks for getting back to us. I'm curious how you managed to get cf-checker using version 1.10? The online version https://cfchecker.ncas.ac.uk/ only shows available versions up to 1.8, and the command-line tool https://github.com/cedadev/cf-checker doesn't seem to recognise when I use the -v argument to specify the version, always defaulting back to v1.7:

image.png (view on web) https://github.com/cemac/CFizer/assets/156211171/e79ba47d-bbb1-4d12-be37-e7a9cdd4e28e

The missing time dimension is something that has also been recommended by someone on the [VAPOR software forum](( https://vapor.discourse.group/t/vapor-gui-not-displaying-variable-metadata/363/7), as I have been unable to view these MONC files using this software.

I've used xarray to add time as a dimension, and cf-checker then gave me a slightly different error:

Checking variable: time WARN: (3): No standard_name or long_name attribute specified

I noticed that the time variable was missing a few attributes so I added these back in using xarray. Below is a screenshot of the dimensions/variables in my modified file, which look as expected:

image.png (view on web) https://github.com/cemac/CFizer/assets/156211171/8ebde3f7-0e7e-43a6-a248-fe0d0c82d046

Unfortunately this doesn't seem to have helped as the cf-checker goes back to the original error at the start of this thread: image.png (view on web) https://github.com/cemac/CFizer/assets/156211171/aa979aab-d55c-4308-9902-b54bab38b257

I'm away on a training course for the bulk of next week but I will continue to look into this when I have time. Thought I would give you both an update in case you can see anything obvious I'm missing.

— Reply to this email directly, view it on GitHub https://github.com/cemac/CFizer/issues/58#issuecomment-2079650351, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7PS3QDUDJJNV6GRRY47BZTY7JZJNAVCNFSM6AAAAABGSOMGJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZGY2TAMZVGE . You are receiving this because you were mentioned.Message ID: @.***>

AnneBarber1 commented 5 months ago

The data I'm using were from the eurec4a project on Jasmin, under /gws/nopw/j04/eurec4auk/cfizer_testing/eurec4a_jan28/. Do you have access to this directory? If not, I can look into transferring a file over to you to look at. The original monc files do indeed have time as a dimension, however upon using CFizer they appear to lose this information -

Single eurec4a monc file before CFizer: /gws/nopw/j04/eurec4auk/abarber/monc_test_files

image

Running CFizer (note that I used "-r '2020-01-25 00:00+00:00'" as a placeholder argument):

(cfizer) [abarber@sci1 CFizer]$ cfize -v -r '2020-01-25 00:00+00:00' ../../monc_test_files
Vocabulary loaded from /home/users/abarber/.conda/envs/cfizer/lib/python3.9/site-packages/cfizer/vocabulary.yml.
Vocabulary validated.
Process 8711: Initialisation took 0.04363298718817532 seconds.
11:23:53 Main app process id: 8711
Application launched from /gws/nopw/j04/eurec4auk/abarber/CFizer_v53/CFizer
Source directory: ../../monc_test_files
Absolute path: /gws/nopw/j04/eurec4auk/abarber/monc_test_files
11:23:53 Processed files will be saved to: /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed
11:23:53 Categorising files by dimension
No files with 0 spatial dimensions; deleting group.
No files with 1 spatial dimensions; deleting group.
No files with 2 spatial dimensions; deleting group.
11:23:53 Serial processing 3:3d.
11:23:53 Process 8711: process_large running on group 3d - file d20200128_diagnostic_3d_7200.nc. Title passed in: d20200128_diagnostic_3d_7200.
11:23:53 Process 8711: cfize running on MONC dataset d20200128_diagnostic_3d_7200 with 3 spatial dimensions.
11:23:53 Process 8711: cfizing variables on MONC dataset d20200128_diagnostic_3d_7200 with 3 spatial dimensions.
         Process 8711: MoncDs.cfize took 0.03793283202685416 seconds.
11:23:53 Process 8711: splitting dataset d20200128_diagnostic_3d_7200 by time.
11:23:53 Process 8711: Created new dataset with title, d20200128_diagnostic_3d_7200_5400
11:23:53 Process 8711: Created new dataset with title, d20200128_diagnostic_3d_7200_7200
         Process 8711: split_ds took 0.0033400380052626133 seconds.
split_ds returned datasets with titles d20200128_diagnostic_3d_7200_5400, d20200128_diagnostic_3d_7200_7200
11:23:53 Process 8711: preparing delayed writer for /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed/d20200128_diagnostic_3d_7200_5400.nc.
         Process 8711: ds_to_nc_dask took 124.13866481510922 seconds.
11:25:57 Process 8711: preparing delayed writer for /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed/d20200128_diagnostic_3d_7200_7200.nc.
         Process 8711: ds_to_nc_dask took 123.0747651939746 seconds.
11:28:00 Process 8711: Computing writes.
         Process 8711: perform_write took 0.0005707538221031427 seconds.
         Process 8711: perform_write took 0.00032923719845712185 seconds.
         Process 8711: process_large took 247.509493912803 seconds.

SUMMARY
=======
Group 3: source files
           d20200128_diagnostic_3d_7200.nc
          --> split --> cfize -->
           d20200128_diagnostic_3d_7200_7200.nc
           d20200128_diagnostic_3d_7200_5400.nc

         Main app process 8711 took 247.63909478997812 seconds.

add_cf_attrs: CF-1.10, section 2.6.2, recommends references be included as a global attribute. This can be specified in config.yml.
add_cf_attrs: CF-1.10, section 2.6.2, recommends comment be included as a global attribute. This can be specified in config.yml.

After CFizer: /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed

image

Time is no longer a dimension, and the time variable is missing its dimensionality.

AnneBarber1 commented 5 months ago

(Apologies, accidentally clicked close on the issue)

sjboeing commented 5 months ago

Thanks for looking into this in more detail, Anne. It would be good if we can ensure it passes a version of the checker (if simply setting the convention to 1.8 does this, we may opt for that). It would be nice to have a version that I can share with Met Office collaborators, they won't be using it directly but are looking for inspiration.

cemaccam commented 5 months ago

That's odd that cfizer is eating the time dimension. You are testing the same files as i did, so i don't know what's going awry there. It should change the name to simply "time", but definitely not delete it.

I can no longer access Jasmin to test, but may still have some test files on onedrive. I wonder if the problem is in the "split along time dimension" operation: might it be removing time as a dimension because each resulting dataset has only one time point? This is something you could test on a jupyter notebook, copying the split algorithm from cfizer, with just a few mods. That will then let you query the resulting ds dimensions.

Actually, the first check should be to check whether the same problem arises when processing 0d, 1d or 2d files. That'll help diagnose where the problem is happening. It's unfortunate that wrappers don't work on multiprocessing scripts, as we could just wrap every method in a wrapper that outputs the dimensions.

If none of that reveals the problem, i can trawl the code next week to see whether i can id the cause.

On Mon, 29 Apr 2024, 20:06 Anne Barber, @.***> wrote:

The data I'm using were from the eurec4a project on Jasmin, under /gws/nopw/j04/eurec4auk/cfizer_testing/eurec4a_jan28/. Do you have access to this directory? If not, I can look into transferring a file over to you to look at. The original monc files do indeed have time as a dimension, however upon using CFizer they same to lose this information -

Single eurec4a monc file before CFizer: /gws/nopw/j04/eurec4auk/abarber/monc_test_files

image.png (view on web) https://github.com/cemac/CFizer/assets/156211171/10e6f29c-c83f-4c99-bc3e-6ca597bdedd5

Running CFizer (note that I used "-r '2020-01-25 00:00+00:00'" as a placeholder argument):

(cfizer) @.*** CFizer]$ cfize -v -r '2020-01-25 00:00+00:00' ../../monc_test_files Vocabulary loaded from /home/users/abarber/.conda/envs/cfizer/lib/python3.9/site-packages/cfizer/vocabulary.yml. Vocabulary validated. Process 8711: Initialisation took 0.04363298718817532 seconds. 11:23:53 Main app process id: 8711 Application launched from /gws/nopw/j04/eurec4auk/abarber/CFizer_v53/CFizer Source directory: ../../monc_test_files Absolute path: /gws/nopw/j04/eurec4auk/abarber/monc_test_files 11:23:53 Processed files will be saved to: /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed 11:23:53 Categorising files by dimension No files with 0 spatial dimensions; deleting group. No files with 1 spatial dimensions; deleting group. No files with 2 spatial dimensions; deleting group. 11:23:53 Serial processing 3:3d. 11:23:53 Process 8711: process_large running on group 3d - file d20200128_diagnostic_3d_7200.nc. Title passed in: d20200128_diagnostic_3d_7200. 11:23:53 Process 8711: cfize running on MONC dataset d20200128_diagnostic_3d_7200 with 3 spatial dimensions. 11:23:53 Process 8711: cfizing variables on MONC dataset d20200128_diagnostic_3d_7200 with 3 spatial dimensions. Process 8711: MoncDs.cfize took 0.03793283202685416 seconds. 11:23:53 Process 8711: splitting dataset d20200128_diagnostic_3d_7200 by time. 11:23:53 Process 8711: Created new dataset with title, d20200128_diagnostic_3d_7200_5400 11:23:53 Process 8711: Created new dataset with title, d20200128_diagnostic_3d_7200_7200 Process 8711: split_ds took 0.0033400380052626133 seconds. split_ds returned datasets with titles d20200128_diagnostic_3d_7200_5400, d20200128_diagnostic_3d_7200_7200 11:23:53 Process 8711: preparing delayed writer for /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed/d20200128_diagnostic_3d_7200_5400.nc. Process 8711: ds_to_nc_dask took 124.13866481510922 seconds. 11:25:57 Process 8711: preparing delayed writer for /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed/d20200128_diagnostic_3d_7200_7200.nc. Process 8711: ds_to_nc_dask took 123.0747651939746 seconds. 11:28:00 Process 8711: Computing writes. Process 8711: perform_write took 0.0005707538221031427 seconds. Process 8711: perform_write took 0.00032923719845712185 seconds. Process 8711: process_large took 247.509493912803 seconds.

SUMMARY

Group 3: source files d20200128_diagnostic_3d_7200.nc --> split --> cfize --> d20200128_diagnostic_3d_7200_7200.nc d20200128_diagnostic_3d_7200_5400.nc

     Main app process 8711 took 247.63909478997812 seconds.

add_cf_attrs: CF-1.10, section 2.6.2, recommends references be included as a global attribute. This can be specified in config.yml. add_cf_attrs: CF-1.10, section 2.6.2, recommends comment be included as a global attribute. This can be specified in config.yml.

After CFizer: /gws/nopw/j04/eurec4auk/abarber/monc_test_files+processed

image.png (view on web) https://github.com/cemac/CFizer/assets/156211171/d3725e14-29ee-44e9-b8fd-347f627ac4bf

Time is no longer a dimension, and the time variable is missing its dimensionality.

— Reply to this email directly, view it on GitHub https://github.com/cemac/CFizer/issues/58#issuecomment-2082387618, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7PS3QDQGVEUUMPN6UEZJBTY7YPCTAVCNFSM6AAAAABGSOMGJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGM4DONRRHA . You are receiving this because you were mentioned.Message ID: @.***>

AnneBarber1 commented 4 months ago

Hi @cemaccam sorry for the radio silence, I've been away on holiday. I've discovered that the time dimension only disappears when CFizer is run on 3d files i.e. we don't see the same problem for 0d/1d/2d files. I think I am correct in saying that only the 3d files get split by time? The 0d/1d/2d files I tested also had two time frames but these were not split.

I suspect this is a bug in the process_large function within cfize.py. I've added some print statements to the code block between lines 144 and 162 which show that indeed, once the 3d file is split, the resulting files no longer have a time dimension:

Code:

image

Output:

image

I'll have another look tomorrow, but it is taking me some time to fully understand the code so I thought I would raise here in case you can see quicker than me.

AnneBarber1 commented 4 months ago

Hi @cemaccam @sjboeing Cameron was correct: the error is to do with the split_ds function in cfize_ds.py (in particular, the line grouped = {point: ds.copy(deep=True) for (point, ds) in dataset.groupby(var)}), which is called by process_large in cfize.py. It is unclear to me exactly why the time dimension is being dropped; I can't see anything online about xarray datasets not being able to have a time dimension with a single timeframe. Not sure if that's something you think we should look into further. For now, I have modified the split_ds function to add the time dimension back in using the expand_dims function. On paper this looks to have worked (if I do an ncdump on the file I can see the new time dimension) but I'm not convinced this has quite worked how I intended as I still get the error we saw when this issue was first opened. I'll keep going with it, but wanted to log where we're at for now.

cemaccam commented 4 months ago

Hi @AnneBarber1 . My apologies also for not getting back to this as I'd intended. Thanks for looking into it. I am moderately certain it didn't always drop the time dimension, but I think I've found the problem and a fix! I don't know if it's a new feature in xarray, but in any case, the current default behaviour for xarray.Dataset.groupby seems to be to remove the dimension along which the arrays have been separated. So, see if this fixes the problem: change the line you highlighted to

grouped = {point: ds.copy(deep=True) for (point, ds) in dataset.groupby(var, squeeze=False)}

Also, have a look at whether you might want to set restore_coord_dims=True. I'm a bit vague on what exactly that will do; I don't think you have multidimensional coordinates in MONC, though.

AnneBarber1 commented 4 months ago

Hi @cemaccam thanks for the suggestion, I tried both with and without restore_coord_dims=True - in both cases the time dimensionality was returned, but it doesn't solve the original issue. I've also checked and I get this same error for 0d/1d/2d files, all of which never had their time dimensionality dropped, so retaining the time dimension isn't going to fix this issue (although good that we spotted it and did something about it regardless!)

I'm scrolling back through our earlier messages to see if I can spot anything else worth trying. You mentioned:

I don't think it's that the time variable isn't valid; it's the Units object to which it's been assigned by the checker that lacks the attribute. If the time unit wasn't valid, the isvalid attribute would simply be False. Another possible source of the error would be a version conflict in the dependencies. CF-checker requires cfunits, but doesn't specify a version, so there's a risk that the version installed doesn't have the Units.isvalid attribute, although that seems unlikely.

I've installed the most recent version of CF-checker on ARC4 and tested it on here. I did initially get an error FileNotFoundError: cfunits requires UNIDATA UDUNITS-2. Can't find the 'udunits2' library but once I installed the udunits2 library I seem to get further (not sure whether this indicates it's a JASMIN-specific error though):

(cfchecker) [lmkk419@login1.arc4 lmkk419]$ cfchecks d20200128_diagnostic_3d_10800_9000.nc
CHECKING NetCDF FILE: d20200128_diagnostic_3d_10800_9000.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.8
Using Standard Name Table Version 84 (2024-01-19T15:55:10Z)
Using Area Type Table Version 11 (06 July 2023)
Using Standardized Region Name Table Version 4 (18 December 2018)

ERROR: (2.6.1): This netCDF file does not appear to contain CF Convention data.

------------------
Checking variable: time
------------------

------------------
Checking variable: w
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: v
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: u
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: th
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: p
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: q_vapour
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: q_cloud_liquid_mass
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: q_rain_mass
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: q_cloud_liquid_number
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: q_rain_number
------------------
WARN: (2.4): space/time dimensions appear in incorrect order
WARN: (2.4): space/time dimensions appear in incorrect order

------------------
Checking variable: zn
------------------

------------------
Checking variable: z
------------------

------------------
Checking variable: options_database
------------------
WARN: (3): No standard_name or long_name attribute specified

------------------
Checking variable: MONC_timestep
------------------

------------------
Checking variable: x
------------------

------------------
Checking variable: y
------------------

------------------
Checking variable: xu
------------------

------------------
Checking variable: yv
------------------

ERRORS detected: 1
WARNINGS given: 21
INFORMATION messages: 0

I do find it weird that every time I'm seeing the log start with This netCDF file does not appear to contain CF Convention data - that doesn't sound good!

AnneBarber1 commented 4 months ago

I do find it weird that every time I'm seeing the log start with This netCDF file does not appear to contain CF Convention data - that doesn't sound good!

Ah, is this because the CF-checker only goes up to version 1.8 so it can't recognise version 1.10 (which you've used to code up CFizer)? UPDATE: Yes I think this is why. I modified the code to assign a label of CF-1.7 instead of CF-1.10, and CF-checker no longer complained :)

As a side note, I realised why I couldn't get CF-checker to run using different CF versions. Apparently you need to specify them using the specific notation 'CF-1.X' (this wasn't made clear on the documentation!)

AnneBarber1 commented 4 months ago

So, to summarise: it looks as though the CF-checker passes for CFizer output (with some warning messages about the ordering of space/time dimensions and attributes for the options_database variable) BUT will currently only work for CF-1.8 or lower AND the version of CF-checker itself it also important (i.e. does not work with the version currently loaded on Jasmin). So if this is a dealbreaker the code will need to be modified to convert MONC output to a lower CF version. Otherwise I'm happy to close the issue once I've heard @sjboeing and @cemaccam 's thoughts.

cemaccam commented 4 months ago

@AnneBarber1 , I'm glad you've worked out how to get it working as expected. The dimension ordering warning is addressed in issue 30, in which I've suggested a fix.

I think it would be a useful enhancement to have CFizer call CF-checker on each output, to ensure compliance (this could be turned on by a command line option). This was part of the original specification from @sjboeing , but I ran out of time to implement. If the problem on Jasmin is only caused by its having a lower version in its jaspy environment, can we add cfchecker with the required version to the dependencies in pyproject.toml? That should then update it as part of the build process.

Given the CFizer output passes the CF Checker against CF-1.8, the quickest fix would simply to change down CF_VERSION = "1.10" in startup.py. But for rigour, change logs for CF-1.9 and CF-1.10 should be checked to ensure none of the changes since 1.8 affect MONC outputs.

sjboeing commented 4 months ago

Thanks both! Changing the version to 1.8 seems the way to go! Let me know if we have a version that we can share publicly.

Cheers,

Steef

AnneBarber1 commented 2 months ago

@sjboeing apologies this has taken me so long to come back to. I'd like to tick this task off so will spend some time focusing on it.

Given the CFizer output passes the CF Checker against CF-1.8, the quickest fix would simply to change down CF_VERSION = "1.10" in startup.py. But for rigour, change logs for CF-1.9 and CF-1.10 should be checked to ensure none of the changes since 1.8 affect MONC outputs.

The version releases for cf-checker seem to be much higher than 1.10, in fact I can't even see a v1.10. Are you aware of any curiosities in version labelling that might explain this? Otherwise I'm not sure how to get my hands on the change logs for version 1.9/1.10.

sjboeing commented 2 months ago

Hi Anne,

I think the CF-Checker versioning is separate from NetCDF version checking. The simplest thing seems to downgrade the NetCDF version to 1.8. Could you check if this works? Thanks for working on this!

Best wishes,

Steef


From: Anne Barber @.> Sent: 15 July 2024 13:59 To: cemac/CFizer @.> Cc: Steven Boeing @.>; Mention @.> Subject: Re: [cemac/CFizer] CFizer output not passing cfchecker tool (Issue #58)

CAUTION: External Message. Use caution opening links and attachments.

@sjboeinghttps://github.com/sjboeing apologies this has taken me so long to come back to. I'd like to tick this task off so will spend some time focusing on it.

Given the CFizer output passes the CF Checker against CF-1.8, the quickest fix would simply to change down CF_VERSION = "1.10" in startup.py. But for rigour, change logs for CF-1.9 and CF-1.10 should be checked to ensure none of the changes since 1.8 affect MONC outputs.

The version releaseshttps://github.com/cedadev/cf-checker/releases for cf-checker seem to be much higher than 1.10, in fact I can't even see a v1.10. Are you aware of any curiosities in version labelling that might explain this? Otherwise I'm not sure how to get my hands on the change logs for version 1.9/1.10.

— Reply to this email directly, view it on GitHubhttps://github.com/cemac/CFizer/issues/58#issuecomment-2228448893, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQLFKLPCYK6M5EXONV4DWTZMPBUTAVCNFSM6AAAAABGSOMGJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGQ2DQOBZGM. You are receiving this because you were mentioned.Message ID: @.***>

AnneBarber1 commented 2 months ago

Hi Steef,

I don't quite understand what you mean by "the CF-Checker versioning is separate from NetCDF version checking" - could you expand?

If I've understood what you mean I've checked this already and it works - see this comment above. We can easily change the CFizer-output metadata version tag, but long-term we want to check that none of the changes since 1.8 affect MONC outputs.

sjboeing commented 2 months ago

Hi Anne,

On the one hand, there is the versioning of the CF conventions (https://cfconventions.org/), which are versions of the "rules" for writing a CF-compliant netCDF files (the rules themselves). These are e.g. 1.8, 1.9. 1.10.

On the other hand, there are versions of CF-checker, which is a software tool to check compliance against these rules. This is completely sepatrate, so you can use e.g. CF-checker version 4.1.0 to check compliance with CF-convention 1.7 (or 1.8, etc)?

Let me know if this helps.

Steef


From: Anne Barber @.> Sent: 15 July 2024 14:49 To: cemac/CFizer @.> Cc: Steven Boeing @.>; Mention @.> Subject: Re: [cemac/CFizer] CFizer output not passing cfchecker tool (Issue #58)

CAUTION: External Message. Use caution opening links and attachments.

Hi Steef,

I don't quite understand what you mean by "the CF-Checker versioning is separate from NetCDF version checking" - could you expand?

If I've understood what you mean I've checked this already and it works - see this commenthttps://github.com/cemac/CFizer/issues/58#issuecomment-2120697758 above. We can easily change the CFizer-output metadata version tag, but long-term we want to check that none of the changes since 1.8 affect MONC outputs.

— Reply to this email directly, view it on GitHubhttps://github.com/cemac/CFizer/issues/58#issuecomment-2228553776, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQLFKNDPMEEKLLHGLAIPF3ZMPHPHAVCNFSM6AAAAABGSOMGJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGU2TGNZXGY. You are receiving this because you were mentioned.Message ID: @.***>

AnneBarber1 commented 2 months ago

Hi Steef,

I understand what you're saying, thank you. In that case it will involve looking through the pull requests between CF-conventions 1.8-1.10 as documented here. Understanding how these changes could impact the CFizer code may not be a trivial task, I will keep you updated as I make progress (and probably create a new issue for this as we are veering from the initial topic!)

Thanks,

Anne

sjboeing commented 2 months ago

Thanks Anne, but maybe it is just fine to use/check against conventions 1.8. Let me know in case this does not work.

Steef


From: Anne Barber @.> Sent: 16 July 2024 10:28 To: cemac/CFizer @.> Cc: Steven Boeing @.>; Mention @.> Subject: Re: [cemac/CFizer] CFizer output not passing cfchecker tool (Issue #58)

CAUTION: External Message. Use caution opening links and attachments.

Hi Steef,

I understand what you're saying, thank you. In that case it will involve looking through the pull requests between CF-conventions 1.8-1.10 as documented herehttps://github.com/cf-convention/cf-conventions/releases. Understanding how these changes could impact the CFizer code will likely not be a trivial task, I will keep you updated as I make progress (and probably create a new issue for this as we are veering from the initial topic!)

Thanks,

Anne

— Reply to this email directly, view it on GitHubhttps://github.com/cemac/CFizer/issues/58#issuecomment-2230436977, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQLFKIYYCNYNDTPM4GQUPTZMTRSDAVCNFSM6AAAAABGSOMGJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZQGQZTMOJXG4. You are receiving this because you were mentioned.Message ID: @.***>

cemaccam commented 2 months ago

I would echo that thinking: given "CFized" datasets pass CF-checker using CF-1.8, it seems a decent bet that the changes in 1.9 and 1.10 have not substantially changed anything CFizer acts on. So an interim solution would be to leave the specification as CF-1.10 but check against CF-1.8. As the next step, I agree it's a good idea to set up any change in CF version as a new issue, with a corresponding branch. That way, if down the line it turns out some critical difference does turn up, there's an easily locatable rewind point.