AtMoDat / atmodat_data_checker

This is a python library that contains checks to ensure compliance with the AtMoDat Standard.
https://www.atmodat.de/
Apache License 2.0
7 stars 2 forks source link

double entries in long_summary_recommended.csv #118

Closed atmodatcode closed 2 years ago

atmodatcode commented 2 years ago

When I let atmodat checker v1.2.0 run on a directory with multiple netCDF files, it creates double entries in the long_summary_recommended.csv file

Here is my directory. The directory contains two netcdf files, only.

ls -l mytest/
-rw-r--r-- 1 119  9. Feb 17:48 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc
-rw-r--r-- 1 119  9. Feb 17:48 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc

I'm using checker version 1.2.0

run_checks --version
ATMODAT Standard Compliance Checker Version: 1.2.0

Now, I let the checker run on the mytest directory (explicitely stating that it should not consider files in subdirectories, if any)

run_checks -s -pnr mytest/ -op mytest_results/
Running Compliance Checker on the dataset from: mytest/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc
2022-02-09 16:55:24.563443 [INFO] :: PYESSV :: Loading vocabularies from /mnt/lustre01/pf/k/k204232/atmodat_data_checker/atmodat_checklib/AtMoDat_CVs/pyessv-archive:
2022-02-09 16:55:24.634617 [INFO] :: PYESSV :: ... loaded: atmodat
Running Compliance Checker on the dataset from: mytest/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc
--- 5.1675 seconds for checking 2 files---

If I now have a look at the long_summary_recommended.csv output, it is obvious that the checker creates redundant output:

more mytest_results/atmodat_checker_output/latest/long_summary_recommended.csv
File,Check level,Global Attribute,Error Message
,,,
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc,recommended,crs,global attribute is not presen
t
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc,recommended,nominal_resolution,global attribut
e is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc,recommended,source_type,global attribute is no
t present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc,recommended,crs,global attribute is not presen
t
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc,recommended,nominal_resolution,global attribut
e is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1949.nc,recommended,source_type,global attribute is no
t present
,,,
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,crs,global attribute is not presen
t
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,nominal_resolution,global attribut
e is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,source_type,global attribute is no
t present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,crs,global attribute is not presen
t
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,nominal_resolution,global attribut
e is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,source_type,global attribute is no
t present

In more detail: it contains the same checker output twice.

grep "BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,crs,global attribute is not present" mytest_results/atmodat_checker_output/latest/long_summary_recommended.csv
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,crs,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1948.nc,recommended,crs,global attribute is not present

When I let the atmodat checker run over a directory with 6 netcdf files that all have the sam missing crs global attribute, the checker will write 6 times the same output for the same file (so there are 5 obsolete (because double) entries.

Example:

ls -l mytest2.  --> directory with only 6 netCDF files
-rw-r--r-- 1 119  9. Feb 18:05 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1955.nc
-rw-r--r-- 1 119  9. Feb 18:05 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1954.nc
-rw-r--r-- 1 119  9. Feb 18:05 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1953.nc
-rw-r--r-- 1 119  9. Feb 18:05 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1952.nc
-rw-r--r-- 1 119  9. Feb 18:05 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1951.nc
-rw-r--r-- 1 119  9. Feb 18:05 BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc

I let the atmodat checker run on this directory:

run_checks -s -pnr mytest2/ -op mytest2_results/
Running Compliance Checker on the dataset from: mytest2/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1953.nc
2022-02-09 17:07:53.988118 [INFO] :: PYESSV :: Loading vocabularies from /mnt/lustre01/pf/k/k204232/atmodat_data_checker/atmodat_checklib/AtMoDat_CVs/pyessv-archive:
2022-02-09 17:07:54.315134 [INFO] :: PYESSV :: ... loaded: atmodat
Running Compliance Checker on the dataset from: mytest2/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc
Running Compliance Checker on the dataset from: mytest2/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1954.nc
Running Compliance Checker on the dataset from: mytest2/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1951.nc
Running Compliance Checker on the dataset from: mytest2/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1955.nc
Running Compliance Checker on the dataset from: mytest2/BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1952.nc
--- 13.3920 seconds for checking 6 files---

When you make a grep on the created long_summary_recommeded.csv file, you see that the same entry shows up six times in the file.

grep "BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present" mytest2_results/atmodat_checker_output/latest/long_summary_recommended.csv
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,recommended,crs,global attribute is not present

The same happens in the long_summary_optional.csv file and presumably also in the long_summary_mandatory.csv file.

grep "BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present" mytest2_results/atmodat_checker_output/latest/long_summary_optional.csv
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present
BSH_simple-Lamb-weather-type-and-gale-calendar_North-Sea_lwtns06_1950.nc,optional,further_info_url,global attribute is not present

Could you please fix this? Thanks!

jkretz commented 2 years ago

Can you give it a try without using the -pnr option?

jkretz commented 2 years ago

It is definitely related to the -pnr option. I will look for a fix