bopen / c3s-eqc-data-checker

Data quality checker
Apache License 2.0
1 stars 0 forks source link

CF compliance check #6

Open Yassminaa opened 1 year ago

Yassminaa commented 1 year ago

hi @malmans2, @cricarpi

I used the tool on ERA5 to perform the CF compliance check. I noticed that no matter the specified version in the configuration file, the check gives 'passed'. Even iI wrote a number is out of logic in the version line (e.g. version=10)

The other point is, apart from the version definition, performing the check file in other data checkers is not fully passed because some mandatory information is missing from the metadata or not meets the standard. see the attached report download.nc_metadata_compliance_report.pdf , done through this tool to the same file: https://podaac-tools.jpl.nasa.gov/mcc/)

Thanks,

malmans2 commented 1 year ago

Hi @Yassminaa ,

We check using cfchecker, which is the same software used in the previous version of the data-checker. I think that when unknown versions are specified, that checker only issues a warning if the version is specified in the nc file as well (we don't show warnings to keep the log cleaner, but we could change that behaviour).

To test yourself, you can use cfchecks from terminal:

cfcheck --help

I'll look into printing informative warnings and will let you know when it's ready for testing.

Could you please:

  1. Share the file you are using to test (or the parameters to download it through cdsapi)
  2. Let me know if cfchecker is not the right tool and let me know which checker you would like to use
Yassminaa commented 1 year ago

Thank you @malmans2

The download scripts for the files I used are here download_ERA5.zip

1- For the tool, I honestly didn't use it before, but I can see a number of open issues on its repository, including the version: https://github.com/cedadev/cf-checker/issues

2- I performed the check from this online tool https://podaac-tools.jpl.nasa.gov/mcc/ , which considers the following concepts http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html and looks it works properly to check the cf-conventions (1.6 version).

Meantime, I will discuss the tool with my colleagues as well and let you know if there is any preference.

Thank you

malmans2 commented 1 year ago

Thanks @Yassminaa! It would be quite hard to implement podaac. See: https://podaacpy.readthedocs.io/en/latest/index.html I tested the python software and is a bit slow because under the hood it needs access to the website you used yourself. It's also quite hard to customize it, it runs many checks other than CF, and a few checks that failed with your test dataset are just recommendations (i.e., they are not mandatory to be CF compliant).

Let me know what you decide, but looks like cfchecker is widely used and ok for our purposes.

Even iI wrote a number is out of logic in the version line (e.g. version=10)

This was quite confusing and I fixed it in our data-checker. Please try the new version. New error:

ERROR    cf_compliance
ERROR      *.grib: version=CF-10 is not available.
ERROR              Available versions: ['CF-1.0', 'CF-1.1', 'CF-1.2', 'CF-1.3', 'CF-1.4', 'CF-1.5', 'CF-1.6', 'CF-1.7', 'CF-1.8'].
Yassminaa commented 1 year ago

hi @malmans2

Yes, i confirm now it understands the logics [1.1 ... 1.8 (we may also add 1.9 and 1.10)], and it still passes for all versions. I don't know if it is the case or not because in the metadata, it is written as '1.6'.
I need to check with other tools and we may keep the discussion about the CF-tool is open at the moment for a bit more

Thanks

malmans2 commented 1 year ago

I did not pick the available versions, those are the versions supported by cfchecks. If users specify a version that is not supported, they will get an error. If the version is automatically inferred instead, the checker uses the version specified in the netcdf attributes or the latest supported by cfchecker.

BTW, we are allowing to specify a version mostly for gribs as they don't have the cf version attributes. netCDF must have a convention attribute in order to be cf compliant.