CDAT / cdat

Community Data Analysis Tools
Other
174 stars 68 forks source link

Catch cdscan errors without triggering runtime failures #1512

Open durack1 opened 9 years ago

durack1 commented 9 years ago

The following problems cause runtime failures in cdscan rather than gracefully being caught in an error and quitting..

[durack1@oceanonly _logs]$ more 150828_120003_make_cmip5_xml-oceanonly-threads40-PID30552.log | grep "PROBLEM 4"
** 0000001 150828_235226 042462.37s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000002 150828_235226 042535.49s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120719/mrro **
** 0000003 150828_235226 042544.44s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000004 150829_005141 045984.87s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp45/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000005 150829_005141 046003.45s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp45/mon/land/Lmon/r1i1p1/v20120626/mrro **
** 0000006 150829_005141 046047.42s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp45/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000007 150829_015437 049842.48s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp60/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000008 150829_015437 049848.99s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp60/mon/land/Lmon/r1i1p1/v20120626/mrro **
** 0000009 150829_015437 049861.26s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp60/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000010 150829_030739 054018.56s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp85/mon/land/Lmon/r1i1p1/v20120416/mrro **
** 0000011 150829_030739 054033.80s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp85/mon/land/Lmon/r1i1p1/v20120626/mrro **
** 0000012 150829_030739 054104.69s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp85/mon/land/Lmon/r1i1p1/v20121012/mrro **
** 0000013 150830_011805 133894.04s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs **
** 0000014 150830_011805 133897.95s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130326/hurs **
** 0000015 150830_012636 134436.62s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/ts **
** 0000016 150830_012636 134785.00s PROBLEM 4 (no outfile) indexing /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/ta **
** 0000017 150830_101001 166191.95s PROBLEM 4 (no outfile) indexing /cmip5_css02/scratch/cmip5/output1/NASA-GISS/GISS-E2-H/historicalExt/mon/atmos/Amon/r1i1p1/v20120119/vas **
** 0000018 150831_211730 292641.93s DATA PROBLEM 4 (no outfile) indexing /cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1 **
** 0000019 150831_221729 296236.05s DATA PROBLEM 4 (no outfile) indexing /cmip5_css02/data/cmip5/output1/INPE/HadGEM2-ES/historical/mon/atmos/Amon/r5i1p1/tasmax/1 **

[durack1@oceanonly _logs]$ source /usr/local/uvcdat/2015-08-25/bin/setup_runtime.csh
Successfully updated your environment to use UVCDAT
(changes are valid for this session/terminal only)
Version: 2.2.0-304-geb6c9a3
Location: /usr/local/uvcdat/2015-08-25
[durack1@oceanonly _logs]$ which cdscan
/usr/local/uvcdat/2015-08-25/bin/cdscan

[durack1@oceanonly _logs]$ cdscan -x ~/cdscan_test.xml /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/*.nc
Finding common directory ...
Common directory: /cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/
Scanning files ...
/cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/mrro_Lmon_CCSM4_rcp26_r1i1p1_200501-210012.nc
Setting reference time units to days since 2005-01-01 00:00:00
/cmip5_css02/scratch/cmip5/output1/NCAR/CCSM4/rcp26/mon/land/Lmon/r1i1p1/v20120416/mrro/mrro_Lmon_CCSM4_rcp26_r1i1p1_200601-210012.nc
Setting reference time units to days since 2005-01-01 00:00:00
Traceback (most recent call last):
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1680, in <module>
    main(sys.argv)
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1539, in main
    raise RuntimeError, "Variable '%s' is duplicated, and is a function of lat or lon: files %s, %s"%illegalvars[0]
RuntimeError: Variable 'mrro' is duplicated, and is a function of lat or lon: files mrro_Lmon_CCSM4_rcp26_r1i1p1_200501-210012.nc, mrro_Lmon_CCSM4_rcp26_r1i1p1_200601-210012.nc

[durack1@oceanonly _logs]$ cdscan -x ~/cdscan_test.xml /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/*.nc
Finding common directory ...
Common directory: /cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/
Scanning files ...
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_085001-094912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_095001-104912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_105001-114912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_115001-124912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_125001-134912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_135001-144912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_145001-154912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_155001-164912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_165001-174912.nc
Setting reference time units to days since 850-01-01 00:00:00
/cmip5_css01/scratch/cmip5/output1/LASG-IAP/FGOALS-s2/past1000/mon/atmos/Amon/r1i1p1/v20130315/hurs/hurs_Amon_FGOALS-s2_past1000_r1i1p1_175001-185012.nc
Setting reference time units to days since 850-01-01 00:00:00
Traceback (most recent call last):
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1680, in <module>
    main(sys.argv)
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1539, in main
    raise RuntimeError, "Variable '%s' is duplicated, and is a function of lat or lon: files %s, %s"%illegalvars[0]
RuntimeError: Variable 'hurs' is duplicated, and is a function of lat or lon: files hurs_Amon_FGOALS-s2_past1000_r1i1p1_145001-154912.nc, hurs_Amon_FGOALS-s2_past1000_r1i1p1_155001-164912.nc

[durack1@oceanonly _logs]$ cdscan -x ~/cdscan_test.xml /cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/*.nc
Finding common directory ...
Common directory: /cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/
Scanning files ...
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_000101-010012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_010101-020012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_020101-030012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_030101-040012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_040101-050012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_050101-060012.nc
Setting reference time units to days since 0001-01-01
/cmip5_css02/data/cmip5/output1/NIMR-KMA/HadGEM2-AO/piControl/mon/atmos/Amon/r1i1p1/zg/1/zg_Amon_HadGEM2-AO_piControl_r1i1p1_060101-070012.nc
Setting reference time units to days since 0001-01-01
Traceback (most recent call last):
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1680, in <module>
    main(sys.argv)
  File "/usr/local/uvcdat/2015-08-25/bin/cdscan", line 1539, in main
    raise RuntimeError, "Variable '%s' is duplicated, and is a function of lat or lon: files %s, %s"%illegalvars[0]
RuntimeError: Variable 'zg' is duplicated, and is a function of lat or lon: files zg_Amon_HadGEM2-AO_piControl_r1i1p1_020101-030012.nc, zg_Amon_HadGEM2-AO_piControl_r1i1p1_030101-040012.nc
doutriaux1 commented 9 years ago

i'll take a look but it is most likely going to be bumped to 3.0

durack1 commented 9 years ago

It's just a matter of adding this to the trapped errors.. Could be done pretty quickly I think.. Maybe @painter1 could also take a look?

doutriaux1 commented 9 years ago

@painter1 any cycles for this? I'll keep it as 2.4 as long as possible.

doutriaux1 commented 8 years ago

@durack1 it does "raise" so it should exit with an error, what do you mean by "gracefully being caught in an error and exit". That's pretty much what happens here.

durack1 commented 8 years ago

I'll take another look at how I am calling this (and collecting stdout and stderr) as long as these are being sent to the correct output I should be able to fix this.. I'll report back on progress..

A question to raise is whether in the case describe above cdscan should try to generate output as it does with overlapping times in multiple files etc.. This case errors, whereas other similar cases throw a warning instead.. The consistency of this error vs warning is probably a better issue to pose..