Unidata / thredds

THREDDS Data Server v4.6
266 stars 179 forks source link

RDA CFSR work #354

Open JohnLCaron opened 8 years ago

JohnLCaron commented 8 years ago

CFSR has various non-standard encodings we need to unravel.


ds093.1 NCEP Climate Forecast System Reanalysis (CFSR) Selected Hourly Time-Series Products, January 1979 to December 2010

ds094.1 NCEP Climate Forecast System Version 2 (CFSv2) Selected Hourly Time-Series Products

ds094.2 NCEP Climate Forecast System Version 2 (CFSv2) Monthly Products
ds094.2_dt (ds094.2 diurnal_timeseries Aggregation)

 Group GaussLatLon_880X1760 (Center 0.0000N 180.0000E) total nrecords=7776, ndups=0 (0.000000), nmiss=21 (0.002701)
 Group LatLon_361X720 (Center 0.2500S 180.0000E) total nrecords=864, ndups=0 (0.000000), nmiss=0 (0.000000)

ds094.2_t (ds094.2 timeseries Aggregation)

 Group GaussLatLon_880X1760 (Center 0.0000N 180.0000E) total nrecords=4436, ndups=0 (0.000000), nmiss=160 (0.036069)
 Group LatLon_361X720 (Center 0.2500S 180.0000E) total nrecords=6372, ndups=0 (0.000000), nmiss=8732 (1.370370)
JohnLCaron commented 8 years ago

ds 093.1

seems ok, except for staggered grid. plus two different grids, latlon and gaussian

referenceDate (46752)
   1979-01-01T00:00:00Z - 2010-12-31T18:00:00Z: count = 46752

table version (2)
       7-0-2-1: count = 44686452
       7-4-2-1: count = 7144392
JohnLCaron commented 8 years ago

ds 094.1

index not written or bad?

referenceDate (7180)
   2011-01-01T00:00:00Z - 2015-11-30T18:00:00Z: count = 7180

table version (2)
       7-0-2-1: count = 3904272
       7-4-2-1: count = 254520
2016-01-07T07:14:39.106 -0700 ERROR - Error making partition /glade/p/rda/data/ds094.1
java.lang.ArrayIndexOutOfBoundsException: 5
    at ucar.nc2.grib.collection.PartitionCollectionMutable$VariableIndexPartitioned.finish(PartitionCollectionMutable.java:114) ~[tdm-5.0.jar:5.0.0-SNAPSHOT]
    at ucar.nc2.grib.collection.GribPartitionBuilder.makeDataset2D(GribPartitionBuilder.java:315) ~[tdm-5.0.jar:5.0.0-SNAPSHOT]
    at ucar.nc2.grib.collection.GribPartitionBuilder.createPartitionedIndex(GribPartitionBuilder.java:163) ~[tdm-5.0.jar:5.0.0-SNAPSHOT]
    at ucar.nc2.grib.collection.GribCdmIndex.updatePartition(GribCdmIndex.java:385) ~[tdm-5.0.jar:5.0.0-SNAPSHOT]
    at ucar.nc2.grib.collection.GribCdmIndex.updateDirectoryCollectionRecurse(GribCdmIndex.java:490) [tdm-5.0.jar:5.0.0-SNAPSHOT]
    at ucar.nc2.grib.collection.GribCdmIndex.updateGribCollection(GribCdmIndex.java:344) [tdm-5.0.jar:5.0.0-SNAPSHOT]
JohnLCaron commented 8 years ago

ds 094.2_t

"regular full monthly means" Note center is 60. NCAR rewrote these? .

referenceDate (59)
   2011-01-01T00:00:00Z - 2015-11-01T00:00:00Z: count = 59

table version (1)
      60-1-2-1: count = 10808

      U0-4-192: count = 464
      U0-4-193: count = 354
      U0-4-196: count = 354
      U0-4-198: count = 708
      U0-5-192: count = 432
      U0-5-193: count = 354
      U0-5-195: count = 708
      U0-5-196: count = 354
      U0-7-193: count = 59

correct is probably monthly average between reference date and " time of end of overall time interval"

probably should be ignored:

50: Length of the time range over which statistical processing is done, in units defined by the previous octet == 124

since it gives nonsense intervals.

possibly it means number of values going into the average

 Center        = (60) United States National Center for Atmospheric Research (NCAR)
 SubCenter     = (1) null
 Master Table  = 2
 Local Table   = 1
 RefTimeSignif = 1 (Start of forecast)
 RefTime       = 2011-01-01T00:00:00Z
 RefTime Fields = 2011-1-1 0:0:0
 ProductionStatus      = 255 (Missing)
 TypeOfProcessedData   = 1 (Forecast products)

(4.8) Product definition template 4.8 - average, accumulation and/or extreme values or other statistically processed values at a horizontal level or in a horizontal layer in a continuous or non-continuous time interval 
  1:                                                                                 PDS length == 70 
  5:                                                                                    Section == 4 
  6:                                                Number of coordinates values after Template == 0 
  8:                                                         Product Definition Template Number == 8 
 10:                                                                         Parameter category == 5 
 11:                                                                           Parameter number == 196 
 12:                                                                 Type of generating process == 2 (table 4.3: Forecast) 
 13:                   Background generating process identifier (defined by originating centre) == 0 (table ProcessId: Table ProcessId code 0 not found) 
 14:         Analysis or forecast generating process identifier (defined by originating centre) == 197 (table ProcessId: Table ProcessId code 197 not found) 
 15:                                                 Hours after reference time of data cut-off == 0 
 17:                                               Minutes after reference time of data cut-off == 0 
 18:                                                            Indicator of unit of time range == 1 (table 4.4: Hour) 
 19:                                                 Forecast time in units defined by octet 18 == 0 
 23:                                                                Type of first fixed surface == 1 (table 4.5: Ground or water surface) 
 24:                                                        Scale factor of first fixed surface == 0 
 25:                                                        Scaled value of first fixed surface == 0 
 29:                                                               Type of second fixed surface == 255 (table 4.5: Missing) 
 30:                                                       Scale factor of second fixed surface == 255 
 31:                                                       Scaled value of second fixed surface == -9999 
 35:                                                Year - time of end of overall time interval == 2011 
 37:                                               Month - time of end of overall time interval == 1 
 38:                                                 Day - time of end of overall time interval == 31 
 39:                                                Hour - time of end of overall time interval == 18 
 40:                                              Minute - time of end of overall time interval == 0 
 41:                                              Second - time of end of overall time interval == 0 
 42: n - number of time range specifications describing the time intervals used to calculate the statistically processed field == 2 
 43:                                 Total number of data values missing in statistical process == 0 
 47: Statistical process used to calculate the processed field from the field at each time increment during the time range == 205 (table 4.10: Table 4.10 code 205 not found) 
 48:        Type of time increment between successive fields used in the statistical processing == 1 (table 4.11: Successive times processed have same forecast time, start time of forecast is incremented) 
 49:         Indicator of unit of time for time range over which statistical processing is done == 1 (table 4.4: Hour) 
 50: Length of the time range over which statistical processing is done, in units defined by the previous octet == 124 
 54:             Indicator of unit of time for the increment between the successive fields used == 1 (table 4.4: Hour) 
 55:           Time increment between successive fields, in units defined by the previous octet == 1 
 59:                                      As octets 47 to 58, next innermost step of processing == -9999 
 71: Additional time range specifications, included in accordance with the value of n. Contents as octets 47 to 58, repeated as necessary == -9999 
JohnLCaron commented 8 years ago

ds 094.2_dt

im guessing this is "6-hourly diurnal monthly means (0000, 0600, 1200, and 1800 UTC)" Note center is 60. NCAR rewrote these?

referenceDate (144)
   2011-01-01T00:00:00Z - 2013-12-01T18:00:00Z: count = 144

table version (1)
      60-1-2-1: count = 8640

      U0-3-196: count = 864
      U2-0-193: count = 864

note "end of overall time interval" fields are = 0

again "'length of the time range" is probably number of values going into the average. this matches the number of days in the month (for each month i checked, eg for feb 2011: 28 )

each month has 4 reference times (00, 06, 12, 18Z), each reference time has 6 forecast times (1-6). Im guess these are the averages of each corresponding hour, over the days of the month. hmm, wonder how CF encodes that?

4.8) Product definition template 4.8 - average, accumulation and/or extreme values or other statistically processed values at a horizontal level or in a horizontal layer in a continuous or non-continuous time interval 
  1:                                                                                 PDS length == 70 
  5:                                                                                    Section == 4 
  6:                                                Number of coordinates values after Template == 0 
  8:                                                         Product Definition Template Number == 8 
 10:                                                                         Parameter category == 1 
 11:                                                                           Parameter number == 8 
 12:                                                                 Type of generating process == 2 (table 4.3: Forecast) 
 13:                   Background generating process identifier (defined by originating centre) == 0 (table ProcessId: Table ProcessId code 0 not found) 
 14:         Analysis or forecast generating process identifier (defined by originating centre) == 197 (table ProcessId: Table ProcessId code 197 not found) 
 15:                                                 Hours after reference time of data cut-off == 0 
 17:                                               Minutes after reference time of data cut-off == 0 
 18:                                                            Indicator of unit of time range == 1 (table 4.4: Hour) 
 19:                                                 Forecast time in units defined by octet 18 == 0 
 23:                                                                Type of first fixed surface == 1 (table 4.5: Ground or water surface) 
 24:                                                        Scale factor of first fixed surface == 0 
 25:                                                        Scaled value of first fixed surface == 0 
 29:                                                               Type of second fixed surface == 255 (table 4.5: Missing) 
 30:                                                       Scale factor of second fixed surface == 255 
 31:                                                       Scaled value of second fixed surface == -9999 
 35:                                                Year - time of end of overall time interval == 0 
 37:                                               Month - time of end of overall time interval == 0 
 38:                                                 Day - time of end of overall time interval == 0 
 39:                                                Hour - time of end of overall time interval == 0 
 40:                                              Minute - time of end of overall time interval == 0 
 41:                                              Second - time of end of overall time interval == 0 
 42: n - number of time range specifications describing the time intervals used to calculate the statistically processed field == 2 
 43:                                 Total number of data values missing in statistical process == 0 
 47: Statistical process used to calculate the processed field from the field at each time increment during the time range == 195 (table 4.10: Table 4.10 code 195 not found) 
 48:        Type of time increment between successive fields used in the statistical processing == 1 (table 4.11: Successive times processed have same forecast time, start time of forecast is incremented) 
 49:         Indicator of unit of time for time range over which statistical processing is done == 1 (table 4.4: Hour) 
 50: Length of the time range over which statistical processing is done, in units defined by the previous octet == 31 
 54:             Indicator of unit of time for the increment between the successive fields used == 1 (table 4.4: Hour) 
 55:           Time increment between successive fields, in units defined by the previous octet == 1 
 59:                                      As octets 47 to 58, next innermost step of processing == -9999 
 71: Additional time range specifications, included in accordance with the value of n. Contents as octets 47 to 58, repeated as necessary == -9999 
JohnLCaron commented 8 years ago

from bob dattore:

See http://rda.ucar.edu/datasets/ds093.2/#!docs and click on "time_ranges.html" for information on how NCEP encoded metadata for monthly mean grids (which is not correct according to the GRIB2 standard)


JohnLCaron commented 8 years ago


So there are really two different quantities here?

1) average of 124 averages. the inner average is the average of the field over the interval (0,4) hours from init (= reference time?)

2) average of 124 discrete values, namely the 4-hour forecasts.

There are files for 01, .. 06, so the complete set has

1) monthly averages of (0,1), (0,2) ... (0,6) averages

2) monthly average of 1, 2, .. 6 hour forecast.

PS: If one parses the record "normally", I get


Grib2Pds{ id=5-192 template=8, forecastTime= 0 timeUnit=1 vertLevel=0.000000} Grib2Pds8: endInterval=2011-01-31T18:00:00Z TimeInterval: statProcessType= 205, timeIncrementType= 1, timeRangeUnit= 1, timeRangeLength= 124, timeIncrementUnit= 1, timeIncrement=4 TimeInterval: statProcessType= 205, timeIncrementType= 2, timeRangeUnit= 1, timeRangeLength= 4, timeIncrementUnit= 1, timeIncrement=0

2) Grib2Pds{ id=5-192 template=8, forecastTime= 4 timeUnit=1 vertLevel=0.000000} Grib2Pds8: endInterval=2011-01-31T22:00:00Z TimeInterval: statProcessType= 193, timeIncrementType= 1, timeRangeUnit= 1, timeRangeLength= 124, timeIncrementUnit= 1, timeIncrement=6 TimeInterval: statProcessType= 205, timeIncrementType= 2, timeRangeUnit= 1, timeRangeLength= 2, timeIncrementUnit= 1, timeIncrement=0

as you see, there are 2 "time range specifications". ive never understood these very well. turns out byte 59 is where that second statProcessType is stored. Its probably wrong, at least in case 2?

apparently im not picking up the different stat types and making 2 different variables, because there are two stat types, i am taking the first one (?). See Grib2Pds.getStatisticalProcessType. Does this happen in other datasets?

worse, Grib2Variable is not disambiguating different statTypes. Also in 4.6, at least for a year. Probably need to check ncep datasets to see if this affects them. adding the statType will mean we have to redo all the ncx.

JohnLCaron commented 8 years ago

From Bob Dattore:

Issue #354: See http://rda.ucar.edu/datasets/ds093.2/#!docs and click on "time_ranges.html" for information on how NCEP encoded metadata for monthly mean grids (which is not correct according to the GRIB2 standard)

The time_ranges.html document applies to ds093.2 (CFSR monthly) and ds094.2 (CFSv2 monthly). The time encodings for the time series files (ds093.1 and ds094.1) should be standard. Let me know if you see something different though and I'll check it out.

JohnLCaron commented 8 years ago


has 3 different stat types:

// 193 // Average of N forecasts (or initialized analyses); each product has forecast period of P1 (P1=0 for initialized analyses); // products have reference times at intervals of P2, beginning at the given reference time.

// 195: // Average of forecast accumulations. P1 = start of accumulation period. P2 = end of accumulation period. // Reference time is the start time of the first forecast, other forecasts at 24-hour intervals.

// 205 // Average of forecast averages. P1 = start of averaging period. P2 = end of averaging period. // Reference time is the start time of the first forecast, other forecasts at 6-hour intervals. Number in Ave = number of forecast used

grabbed 195 from ncep, assume its the same

probably dont have time intervals correct yet