CDAT / cdms

8 stars 10 forks source link

Catch cdscan errors without triggering runtime failures #49

Open chaosphere2112 opened 7 years ago

chaosphere2112 commented 7 years ago

The following problems cause runtime failures in cdscan rather than gracefully being caught in an error and quitting..

[durack1@oceanonly _logs]$ more 150828_120003_make_cmip5_xml-oceanonly-threads40-PID30552.log | grep "PROBLEM 4"
** 0000001 150828_235226 042462.37s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000002 150828_235226 042535.49s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120719/mrro **
** 0000003 150828_235226 042544.44s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000004 150829_005141 045984.87s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp45/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000005 150829_005141 046003.45s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp45/mon/land/Lmon/r1i1p1/v20120626/mrro **
** 0000006 150829_005141 046047.42s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp45/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000007 150829_015437 049842.48s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp60/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000008 150829_015437 049848.99s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp60/mon/land/Lmon/r1i1p1/v20120626/mrro **
** 0000009 150829_015437 049861.26s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp60/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000010 150829_030739 054018.56s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp85/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000011 150829_030739 054033.80s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp85/mon/land/Lmon/r1i1p1/v20120626/mrro **
** 0000012 150829_030739 054104.69s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp85/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000013 150830_011805 133894.04s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs **
** 0000014 150830_011805 133897.95s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130326/hurs **
** 0000015 150830_012636 134436.62s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/ts **
** 0000016 150830_012636 134785.00s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/ta **
** 0000017 150830_101001 166191.95s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NASA-GISS/GISS-E2-H/historicalExt/mon/atmos/Amon/r1i1p1/v20120119/vas **
** 0000018 150831_211730 292641.93s DATA PROBLEM 4 (no outfile) indexing /cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1 **
** 0000019 150831_221729 296236.05s DATA PROBLEM 4 (no outfile) indexing /cmip5_css02/data/cmip5/output1/INPE/HadGEM2-ES/historical/mon/atmos/Amon/r5i1p1/tasmax/1 **

[durack1@oceanonly _logs]$ source /usr/local/uvcdat/2015-08-25/bin/setup_runtime.csh
Successfully updated your environment to use UVCDAT
(changes are valid for this session/terminal only)
Version: 2.2.0-304-geb6c9a3
Location: /usr/local/uvcdat/2015-08-25
[durack1@oceanonly _logs]$ which cdscan
/usr/local/uvcdat/2015-08-25/bin/cdscan

[durack1@oceanonly _logs]$ cdscan -x ~/cdscan_test.xml /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/*.nc
Finding common directory ...
Common directory: /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/
Scanning files ...
/cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/mrro_Lmon_CCSM4_rcp26_r1i1p1_200501-210012.nc
Setting reference time units to days since 2005-01-01 00:00:00
/cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/mrro_Lmon_CCSM4_rcp26_r1i1p1_200601-210012.nc
Setting reference time units to days since 2005-01-01 00:00:00
Traceback (most recent call last):
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1680, in <module>
    main(sys.argv)
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1539, in main
    raise RuntimeError, "Variable '%s' is duplicated, and is a function of lat or lon: files %s, %s"%illegalvars[0]
RuntimeError: Variable 'mrro' is duplicated, and is a function of lat or lon: files mrro_Lmon_CCSM4_rcp26_r1i1p1_200501-210012.nc, mrro_Lmon_CCSM4_rcp26_r1i1p1_200601-210012.nc

[durack1@oceanonly _logs]$ cdscan -x ~/cdscan_test.xml /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/*.nc
Finding common directory ...
Common directory: /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/
Scanning files ...
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_085001-094912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_095001-104912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_105001-114912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_115001-124912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_125001-134912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_135001-144912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_145001-154912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_155001-164912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_165001-174912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_175001-185012.nc
Setting reference time units to days since 850-01-01 00:00:00
Traceback (most recent call last):
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1680, in <module>
    main(sys.argv)
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1539, in main
    raise RuntimeError, "Variable '%s' is duplicated, and is a function of lat or lon: files %s, %s"%illegalvars[0]
RuntimeError: Variable 'hurs' is duplicated, and is a function of lat or lon: files hurs_Amon_FGOALS-s2_past1000_r1i1p1_145001-154912.nc, hurs_Amon_FGOALS-s2_past1000_r1i1p1_155001-164912.nc

[durack1@oceanonly _logs]$ cdscan -x ~/cdscan_test.xml /cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/*.nc
Finding common directory ...
Common directory: /cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/
Scanning files ...
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_000101-010012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_010101-020012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_020101-030012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_030101-040012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_040101-050012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_050101-060012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_060101-070012.nc
Setting reference time units to days since 0001-01-01
Traceback (most recent call last):
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1680, in <module>
    main(sys.argv)
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1539, in main
    raise RuntimeError, "Variable '%s' is duplicated, and is a function of lat or lon: files %s, %s"%illegalvars[0]
RuntimeError: Variable 'zg' is duplicated, and is a function of lat or lon: files zg_Amon_HadGEM2-AO_piControl_r1i1p1_020101-030012.nc, zg_Amon_HadGEM2-AO_piControl_r1i1p1_030101-040012.nc

Migrated from: https://github.com/UV-CDAT/uvcdat/issues/1512

dnadeau4 commented 6 years ago

@durack1 2 files have the same units time:units = "days since 0301-01-01" ; The message is very confusing though.

Can you provide me with something better?

Variable 'zg' is duplicated, and is a function of lat or lon: files zg_Amon_HadGEM2-AO_piControl_r1i1p1_020101-030012.nc, zg_Amon_HadGEM2-AO_piControl_r1i1p1_030101-040012.nc

/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_020101-030012.nc
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_030101-040012.nc
dnadeau4 commented 6 years ago

Similar time overlap of these 2 files. Actually the file timestamps gave a clue...

The first date is 2005-01-16 12 in both file.

/cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/mrro_Lmon_CCSM4_rcp26_r1i1p1_200501-210012.nc

and

/cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/mrro_Lmon_CCSM4_rcp26_r1i1p1_200601-210012.nc

durack1 commented 6 years ago

@dnadeau4 the reason this issue was generated was that when called in a script, the runtime errors (rather than stdout, stderr output) was causing problems. In most cases that I am aware of, the issue is with the input data, so there are problems, but the way that cdscan behaves isn't helpful..

doutriaux1 commented 6 years ago

@durack1 you can call cdscan from inside a script

cdms2.cdscan.main(["cdscan","-x","crap.xml","*.nc"])
durack1 commented 6 years ago

@doutriaux1 that's exactly what I am doing, but the issue is that runtime failures then trip scripts over.. I was suggesting that as cdscan is a script that is often called outside of an active python/cdms session, it uses the stdout/stderr rather than runtime errors

dnadeau4 commented 6 years ago

I tripped on that message Variable 'zg' is duplicated, and is a function of lat or lon:

It does not tell us that this is a time issue, and I did not understand why zg as duplicated. Maybe this would be better Variable 'zg' has duplicated time and is a function of lat or lon:

dnadeau4 commented 6 years ago

@durack1 this is a run time error and the program cannot continue. I could raise another kind of exception like "value error". You will still have the back trace, that is how python stops.

durack1 commented 6 years ago

@dnadeau4 the issue for me was rather than returning a stderr python returns a runtime error, back trace and trips over scripts that were calling it.. Is it possible to get python "binaries" to behave the same way as a compiled binary, so stdout/stderr?