OpenCDSS / cdss-lib-dmi-hydrobase-rest-java

CDSS HydroBase REST web service client library written in Java
GNU General Public License v3.0
0 stars 0 forks source link

Historical stations - need to implement web services #38

Open smalers opened 2 years ago

smalers commented 2 years ago

The historical web services need to be implemented for surface water and climate stations.

smalers commented 2 years ago

While implementing TSTool features to use CDSS web services for historical station data, I have the following questions and observations. I will update this issue as I work on the implementation. Given that I did not have budget to implement features when web services first came out, the time may have passed for the State to respond to feedback. However, at a minimum, maybe web service documentation could be enhanced to explain nuances and maybe the State has its own issue list internally that could benefit from some of the following. I expect to have features working in the next couple of days and will make decisions along the way. As usual, I will update the TSTool datastore appendix documentation to reflect the implementation.

Observations and Notes

  1. Feedback on MeasType:

    1. surfacewaterstationdatatypes and climatesttiondatatypes services both return some of the same "measType". Should Streamflow be returned for surface water stations and the others climate stations? For now I will implement the code as best I can. Note also that some of the data types such as MaxTemp has spaces at the end, resulting in redundant data types.
    Climate station data types:
    
      "measType": "Evap",
      "measType": "FrostDate",
      "measType": "MaxTemp   ",
      "measType": "MaxTemp",
      "measType": "MeanTemp",
      "measType": "MinTemp   ",
      "measType": "MinTemp",
      "measType": "Precip",
      "measType": "Snow",
      "measType": "SnowDepth",
      "measType": "SnowSWE",
      "measType": "Solar",
      "measType": "Streamflow",
      "measType": "VP",
      "measType": "Wind",
    
    Surface water station data types:
    
      "measType": "MaxTemp",
      "measType": "MeanTemp",
      "measType": "MinTemp",
      "measType": "Solar",
      "measType": "Streamflow",
      "measType": "VP",
      "measType": "Wind",
  2. Feedback on dateFormat:

    1. It would be helpful if the API documentation were more clear as to defaults. For example, what is the date/time default for dateFormat? Also, for data that are dates, why not make the default format the date. There can be confusion as to whether midnight (which is time zero of a day) corresponds to a computed value over the previous day. Using date format would clear this up.
    2. Perhaps specifying dateFormat should not change modified since that is always a time? In particular if someone is requesting monthly or annual data for an automated data dump, it would be good to check the actual modification time, not the month or year from the modification time.
    3. For month and year data, the date is in parts but if were provided in ISO 8601 form, it would be good to default to precision of the data.
    4. Annual streamflow data are in water year. OK, but that is more constrained than other web services. Maybe have a parameter that would allow returning in calendar year. I am pondering whether TSTool should ask for monthly time series and convert to year interval, or whether the data type should indicate water year.
  3. Feedback on measCount:

    1. Would be good in documentation to indicate what this is. For monthly data it appears to be the number of daily values used in computing statistics. For year interval data it appears to be the same, which implies that year interval data are computed from day data.
  4. Feedback on value:

    1. For time series data, it would be helpful to indicate in API documentation that only dates with a measurement or calculated value are returned. In other words, value is always a valid number and consuming code does not need to deal with a missing value indicator such as null, NaN (JSON does not actually use), or -999 (which can be valid valid value for some time series).
    2. I am using objects in the code and can handle setting to null for missing so even if I don't understand how missing is handled for everything, hopefully using nulls will work in most cases.
  5. Feedback on climate station frost dates:

    1. The frost dates values do not contain. Since the climate station data types don't contain frost dates, I am going to assume that other temperature data are at the same stations to get the agency.
    2. The frost dates records do not contain siteId or any other human-facing identifier. This can be confusing when more than one station's data are read.
  6. Feedback on surfacewaterstationdatatypes service:

    1. Does not allow query by abbrev.
  7. Feedback on climatestations:

    1. Uses latitude when other services use latdecdeg
    2. Uses longitude when other services use longdecdeg
  8. Feedback on surfacewaterstations:

    1. Uses latitude when other services use latdecdeg
    2. Uses longitude when other services use longdecdeg
    3. Why is measUnit included when that is specific to data types?

Design Approach

It is not possible to fully rely on web services to provide a clean list of data types for use in TSTool and therefore some manual handling in the code is necessary. Some design decisions are as follows:

  1. It appears that the web services data are similar to the previous HydroBase data and therefore I am trying to use similar conventions for both.
  2. Each major service is indicated by a group in the Data Type selector. This makes it relatively straightforward to cross reference TSTool with web services. See the example below for Surface Water Station.
  3. Some of the datatypes being used in web services are indicated by a main measType and then an interval value computed as a statistic, for example for monthly streamflow measType=Streamflow and a data value of minQCfs. In the TSTool world, this combines main data type, statistic, and data units. To make this work in TSTool, I am going to use a data type of Streamflow-Min and set the units as appropriate. See the example below.
  4. The concept of statistic when applied to interval data can be applied generally. However, I need to decide if this makes sense based on some experiments to see what data is available. Things can be confusing. I can implement the functionality for people to access the data, but my explanation of the data may be incomplete or inaccurate if I don't have all the explanation. For example:
    1. For climate station Evap are minimum and maximum values for a month the minimum and maximum daily values in the month? I think yes.
    2. For MaxTemp, it is really daily maximum temperature, so a monthly time series for minValue would be the monthly minimum of daily maximum temperature, right?
    3. Are the month and year interval values for streamflow based on analysis of "instantaneous" or daily values? For example, is month time series minQCfs the minimum of instantaneous streamflow, or minimum of the average daily flows? This is important for peak flow analysis.
    4. *Better documentation explaining the measType and `value` handling in web services would be helpful.**
  5. Units are specified in different ways. Sometimes they are in station object, sometimes in data type object, and sometimes in the time series records. Sometimes, as in monthly streamflow, the units are in the value name like minQCfs. All of this is handled in TSTool but it is not consistent in the services.
  6. Telemetry stations have open-ended list of data types and a lot more variability, so they are the fall-through when listed in TSTool.

image

smalers commented 2 years ago

After implementing the web services, automated tests, and documentation, here are the remaining questions and comments that I have for the State. Other feedback above is still relevant, such as the need for improved documentation for some of the details. The automated tests compare HydroBase database datastore and REST web services datastore results because otherwise it is difficult to know if the REST web service values are as expected.

The following is an example of a Linux command that can be used to check for distinct measType in the data types results:

grep measType /C/Users/sam/Downloads/download-climate-datatypes.json | sort -u
  1. Climate station data types:
    1. Climate station data types still include Streamflow - is this a data issue? From Doug: [Fixed] I confirmed.
    2. Surface water station data types now only includes Streamflow, which is good.
    3. Now that the data types have been cleaned up, TSTool does queries up front to get a unique list of measType to use in data type choice. This seems to be a bit slow for surface water stations. It would be useful to have a service for the unique list of measType each for climate stations and surface water stations to increase performance. From Doug: [Fixed]. Are you using GET api/v2/climatedata/climatestationsdatatypes and GET api/v2/surfacewater/surfacewaterstationdatatypes? They return a unique list of measType for each station. Or are you asking for 2 new services that just return a unique list of measType for ClimateStation / Surface Water Stations regardless of station? I have been doing a query like Doug indicated and now that the response is fast, I retract my request for a new service.
    4. Would it be possible to set the measUnit on frost dates, such as to day? Currently the units are blank. From Doug: [Done] I verified.
    5. The dataSource has CoAgMet and COAGMET. From Doug: [Fixed] I verified. I have not checked to see whether the same station is loaded with both but it would be good to use only one abbreviation.
    6. Climate station data types use dataSource as if an abbreviation and does not have dataSourceAbbrev. This is inconsistent with telemetry stations. I realize that it may be difficult to change at this point. From Doug: [No Action]
  2. Streamflow month and year interval volume from HydroBase does not agree with web services. This appears to be because the factor to convert from cfs to af units is different. To get the automated test to run without warnings I had to multiply the REST web service values by 1.00025216. I recommend that the State confirm that the conversion factor is the same for HydroBase database calculations and web services. From Doug: [Doug] Found that Hydrobase was using 1.984 instead of 1.9835. Have changed views in HydroBase that will take effect in the next database snapshot release. Previous releases will be slightly off. This change will likely break automated tests that I run for TSTool (when I get a new HydroBase) and will change CDSS model input slightly for larger streamflow numbers due to the number of digits used. This type of change is hard on automated testing because it requires running different tests on different versions of the database, which takes resources to configure. This is just the way it is.
  3. Streamflow year interval values are water year. It would be useful to provide a calendar year service. I was able to compare service results by processing HydroBase monthly values into water year. Previous feedback indicates that some data is retrieved from third parties and I get that. From Doug: [No action]
  4. Some data types are the same in climate stations and telemetry stations (if case is ignored), including Evap, Precip, and Solar. This means that TSTool time series identifiers that only use the data type would be ambiguous for Day interval. One solution is to use a location type of climate in the time series identifier as in climate:StationId.DataType.DataSource.DataInterval. However, It appears that climate stations do not have measType values that overlaps telemetry station. Therefore, I am going to try to avoid putting climate: in front of time series identifiers because TSID should be unique. This will have to change if ambiguity results in the future. From Doug: [No Action]. This is mainly a TSTool convention and the State can comment on documentation when I have that done.
  5. Frost dates, dateFormat does not seem to work. I tried using dateFormat=dateOnly since frost dates have precision to day. From Doug: https://dwr.state.co.us/Rest/GET/api/v2/climatedata/climatestationfrostdates/?dateFormat=dateOnly&stationNum=1 is formatting results correctly. My bad on this. It does work. My feedback similar to a previous comment is that in such a case it makes sense that the default date format would just be date (no time) since time is not used.
  6. Monthly statistics:
    1. By experimentation, it appears that values are computed by using the daily values as the sample. That is OK but has some impacts on statistics. From Doug: [Doug] Statistics are computed using the data pulled from NOAA and CoAgMet services, which is daily.
    2. The Total monthly values do not seem to be computed for the following measType: MinTemp, MaxTemp, MeanTemp, SnowDepth, SnowSWE, Solar, VP, Wind. This makes sense for some because the values are instantaneous. However, monthly temperature total such as total of MeanTemp could be used for degree days when evaluating climate change. Total solar radiation for a month may also be useful for consumptive use modeling but I'd have to research the units more. Total wind run is useful to understand how windy a location is and can be used in consumptive use modeling. Kelley Thompson might have opinions on such things. I have the queries working and now need to write automated tests so I'll know soon how the web services compare to HydroBase database queries. From Doug: [Doug] These are new summations that previously have not been produced. As mentioned above, this potential request should be discussed with the CWCB/DWR Team to see if it is something we want to provide.
  7. No year interval for climate stations statistics. Annual data would be useful in some cases, for example for climate change analysis and generally for long-term trend analysis. Water year complicates things. From Doug: [Doug] This potential request should be discussed with the CWCB/DWR Team to see if it is something we want to provide.
    1. Total precipitation.
    2. Mean temperature.
    3. Total evaporation.

Status of Resolving Issues

Overall, my questions have been answered and issues addressed. Remaining issues are related to improving documentation, which the State can do during normal maintenance. Inconsistencies in the API are unfortunately baked in at this point because people are using the API. Perhaps inconsistencies can be addressed in the future if a new version is released. The new time series that I have suggested can be discussed with the CDSS group. Enabling these will require changes to TSTool since I am currently filtering out the statistics that have all nulls in the results so as to not confuse users.

I will keep this issue open until a TSTool release is made because I may find some additional issues as I finalize the tests.

smalers commented 2 years ago

The following are issues found during testing. I added many tests and believe the following are unresolved. Links are provided to the current test command file because the content below may be out of date. I will publish a software release when done putting together the tests so that the State can download an run themselves. All testing was done with HydroBase_20210322. These issues may be indicative of similar issues at other stations. Although it would be possible to compare all stations in HydroBase with all stations in web services, that comparison is beyond the scope of software automated tests.

MeanTemp-Avg.Month Test fails

The following TSTool command file shows differences:

# Test reading an MeanTemp-Avg month interval time series from ColoradoHydroBaseRest web service using a TSID.
# - Compare the resulting time series with that retrieved from HydroBase
# - allow some missing based on database
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Avg_Month.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Avg_Month_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2015-01",InputEnd="2018-12")
# FTC01 - FORT COLLINS
FTC01.CoAgMet.MeanTemp-Avg.Month~HydroBaseWeb
FTC01.CoAgMet.TempMean.Month~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=0,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="FTC01.CoAgMet.MeanTemp-Avg.Month",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Avg_Month_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".01",IfDifferent=Warn)

MeanTemp-Max.Month Test fails

I'm not sure why the following command file fails. I may not be understanding the contents of the data. Monthly statistics computed on daily statistic is confusing.

# Test reading an MeanTemp-Max month interval time series from ColoradoHydroBaseRest web service using a TSID.
# - Compare the resulting time series with that retrieved from HydroBase
# - allow some missing based on database
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Max_Month.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Max_Month_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2015-01",InputEnd="2018-12")
# FTC01 - FORT COLLINS
FTC01.CoAgMet.MeanTemp-Max.Month~HydroBaseWeb
FTC01.CoAgMet.TempMeanMax.Month~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=0,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="FTC01.CoAgMet.MeanTemp-Max.Month",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Max_Month_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".01",IfDifferent=Warn)

MeanTemp-Min.Month Test fails

I'm not sure why the following command file fails. I may not be understanding the contents of the data. Monthly statistics computed on daily statistic is confusing.

# Test reading an MeanTemp-Min month interval time series from ColoradoHydroBaseRest web service using a TSID.
# - Compare the resulting time series with that retrieved from HydroBase
# - allow some missing based on database
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Min_Month.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Min_Month_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2015-01",InputEnd="2018-12")
# FTC01 - FORT COLLINS
FTC01.CoAgMet.MeanTemp-Min.Month~HydroBaseWeb
FTC01.CoAgMet.TempMeanMin.Month~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=0,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="FTC01.CoAgMet.MeanTemp-Min.Month",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_MeanTemp-Min_Month_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".01",IfDifferent=Warn)

Snow.Day Test Fails

The following test fails. The HydroBase database has a much shorter period and values are slightly different, although it has more recent data and web services does not.

# Test reading an Snow day interval time series from ColoradoHydroBaseRest web service using a TSID.
# - Compare the resulting time series with that retrieved from HydroBase
# - allow a high number of missing based on database, due to winter, etc.
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Snow_Day.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Snow_Day_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="1949-04-06",InputEnd="2018-12-31")
# USC00053005 - FORT COLLINS
USC00053005.NOAA.Snow.Day~HydroBaseWeb
USC00053005.NOAA.Snow.Day~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=11501,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="USC00053005.NOAA.Snow.Day",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Snow_Day_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".01",IfDifferent=Warn)

Snow-Max.Month Test Fails

The Snow-Total.Month test also fails.

The following test fails. The HydroBase database and web service values for last part of the time series are different. I may just need to use a newer version of HydroBase that has updated data.

# Test reading a Snow-Max month interval time series from ColoradoHydroBaseRest web service using a TSID.
# - compare the resulting time series with that retrieved from HydroBase
# - HydroBase does not include the monthly statistic so compute from daily and then compare
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Snow-Max_Month.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Snow-Max_Month_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2014-08",InputEnd="2018-05")
# USC00053005 - FORT COLLINS
USC00053005.NOAA.Snow-Max.Month~HydroBaseWeb
USC00053005.NOAA.Snow.Day~HydroBase
SetInputPeriod(InputStart="2014-08-01",InputEnd="2018-05-31")
NewStatisticMonthTimeSeries(TSID="USC00053005.NOAA.Snow.Day",Alias="USC00053005-HydroBase-Month",NewTSID="USC00053005..Snow-Max.Month",Statistic=Max)
# Make sure that enough data are available in the test data, and some missing.
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(TSList=AllMatchingTSID,TSID="*Month",Statistic="MissingCount",CheckCriteria=">",CheckValue1=0,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="USC00053005.NOAA.Snow-Max.Month",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Snow-Max_Day_out.dv",Precision=2)
CompareTimeSeries(TSID1="USC00053005.NOAA.Snow-Max.Month",TSID2="USC00053005-HydroBase-Month",Tolerance=".01",IfDifferent=Warn)

SnowDepth.Day Test fails

The following test fails. The patterns are similar but numbers are different. Do I need an updated HydroBase?

# Test reading a SnowDepth day interval time series from ColoradoHydroBaseRest web service using a TSID.
# - compare the resulting time series with that retrieved from HydroBase
# - allow a high number of missing based on database
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth_Day.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth_Day_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="1949-04-06",InputEnd="2018-12-31")
# USC00053005 - FORT COLLINS
USC00053005.NOAA.SnowDepth.Day~HydroBaseWeb
USC00053005.NOAA.SnowCourseDepth.Day~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=11501,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="USC00053005.NOAA.SnowDepth.Day",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth_Day_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".01",IfDifferent=Warn)

SnowDepth-Avg.Month Test Fails

The following test fails, likely because the daily data test fails.

# Test reading a SnowDepth-Avg month interval time series from ColoradoHydroBaseRest web service using a TSID.
# - compare the resulting time series with that retrieved from HydroBase
# - HydroBase does not include the monthly statistic so compute from daily and then compare
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth-Avg_Month.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth-Avg_Month_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase.
SetInputPeriod(InputStart="2014-08",InputEnd="2018-05")
# USC00053005 - FORT COLLINS
USC00053005.NOAA.SnowDepth-Avg.Month~HydroBaseWeb
USC00053005.NOAA.SnowCourseDepth.Day~HydroBase
SetInputPeriod(InputStart="2014-08-01",InputEnd="2018-05-31")
NewStatisticMonthTimeSeries(TSID="USC00053005.NOAA.SnowCourseDepth.Day",Alias="USC00053005-HydroBase-Month",NewTSID="USC00053005..SnowCourseDepth-Avg.Month",Statistic=Mean)
# Make sure that enough data are available in the test data, and some missing.
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(TSList=AllMatchingTSID,TSID="*Month",Statistic="MissingCount",CheckCriteria=">",CheckValue1=0,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="USC00053005.NOAA.SnowDepth-Avg.Month",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth-Avg_Day_out.dv",Precision=2)
CompareTimeSeries(TSID1="USC00053005.NOAA.SnowDepth-Avg.Month",TSID2="USC00053005-HydroBase-Month",Tolerance=".01",IfDifferent=Warn)

SnowDepth-Max.Month Test Fails

The following test fails, likely because the daily data test fails.

Note that the minimum test passes but that is because all of the values are zero.

# Test reading a SnowDepth-Max month interval time series from ColoradoHydroBaseRest web service using a TSID.
# - compare the resulting time series with that retrieved from HydroBase
# - HydroBase does not include the monthly statistic so compute from daily and then compare
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth-Max_Month.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth-Max_Month_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2014-08",InputEnd="2018-05")
# USC00053005 - FORT COLLINS
USC00053005.NOAA.SnowDepth-Max.Month~HydroBaseWeb
USC00053005.NOAA.SnowCourseDepth.Day~HydroBase
SetInputPeriod(InputStart="2014-08-01",InputEnd="2018-05-31")
NewStatisticMonthTimeSeries(TSID="USC00053005.NOAA.SnowCourseDepth.Day",Alias="USC00053005-HydroBase-Month",NewTSID="USC00053005..SnowCourseDepth-Max.Month",Statistic=Max)
# Make sure that enough data are available in the test data, and some missing.
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(TSList=AllMatchingTSID,TSID="*Month",Statistic="MissingCount",CheckCriteria=">",CheckValue1=0,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="USC00053005.NOAA.SnowDepth-Max.Month",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_SnowDepth-Max_Day_out.dv",Precision=2)
CompareTimeSeries(TSID1="USC00053005.NOAA.SnowDepth-Max.Month",TSID2="USC00053005-HydroBase-Month",Tolerance=".01",IfDifferent=Warn)

Solar.Day Test fails.

The following test fails.
The HydroBase database and web service values match during the start and end of the period but the middle is totally different. This is odd because other tests for statistics based on the daily data pass.

# Test reading a Solar day interval time series from ColoradoHydroBaseRest web service using a TSID.
# - compare the resulting time series with that retrieved from HydroBase
# - allow a high number of missing based on database
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Solar_Day.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Solar_Day_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2015-01-01",InputEnd="2020-07-20")
# USC00053005 - FORT COLLINS
FCL01.CoAgMet.Solar.Day~HydroBaseWeb
FCL01.CoAgMet.Solar.Day~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=153,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="FCL01.CoAgMet.Solar.Day",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Solar_Day_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".1",IfDifferent=Warn)

Wind.Day Test fails

The following test fails. The HydroBase database and web service values match during the start and end of the period but the middle is totally different. This is odd because other tests for statistics based on the daily data pass.

# Test reading a Wind day interval time series from ColoradoHydroBaseRest web service using a TSID.
# - Compare the resulting time series with that retrieved from HydroBase
# - allow a high number of missing based on database, due to winter, etc.
StartLog(LogFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Wind_Day.TSTool.log")
RemoveFile(InputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Wind_Day_out.dv",IfNotFound=Ignore)
# Read the same time series from the web service and HydroBase
SetInputPeriod(InputStart="2015-01-01",InputEnd="2020-07-20")
# USC00053005 - FORT COLLINS
FCL01.CoAgMet.Wind.Day~HydroBaseWeb
FCL01.CoAgMet.Wind.Day~HydroBase
# Make sure that enough data are available in the test data, and some missing
CheckTimeSeriesStatistic(Statistic="NonmissingCount",CheckCriteria="<=",CheckValue1=10,IfCriteriaMet=Warn)
CheckTimeSeriesStatistic(Statistic="MissingCount",CheckCriteria=">",CheckValue1=153,IfCriteriaMet=Warn)
WriteDateValue(TSList=LastMatchingTSID,TSID="FCL01.CoAgMet.Wind.Day",OutputFile="Results/Test_TSID_ColoradoHydroBaseRest_CompareHydroBase_Wind_Day_out.dv",Precision=2)
CompareTimeSeries(Tolerance=".1",IfDifferent=Warn)
smalers commented 2 years ago

Doug Stenzel provided a new HydroBase snapshot dated 20220330, which includes some data load issues and other changes to correct issues identified above. Based on this, I have been fixing tests so that the software checks out. Below are things that I had to deal with.

Snow Course depth is now a double rather than integer - snow data issues are resolved

The issues with snow-related data tests failing have been resolved. I did have to make some software changes as described below.

Previously, the database design used integer for vw_CDSS_SnowCrse.depth and now uses a double. This was causing issues because in the TSTool HydroBase code null (missing) database values when treated as an integer use the Java Integer.MIN_VALUE, which has a value -2147483648. Floating point missing values use Double.NaN. Underlying code was not aware of the database type change, which ended up resulting in -2147483648 being used in the time series. It is difficult to know when such situations arise without release notes for the database or automated tests that point out the issues, and the problem may have been present for a while. In this case, I changed the data value in the object to a double and handle casting from integer for older databases. There is potential that this will result in minor roundoff issues but probably not a big deal.

Note that vw_CDSS_SnowCrse.day was also changed from text to integer at some point. Presumably the data load at some point used text for a reason? I changed to integer in the code, which makes more sense. If the day is somehow null in the database, then it may show up later as -2147483648 and should generate an error. There may be an issue on older databases but I tested with recent versions and it works ok. People should not be using old HydroBase databases for snow data since that data can be found online.

Streamflow volume (AF) calculated from CFS is the same in HydroBase and web services - streamflow data issues are resolved

The issues with streamflow-related data tests failing have been resolved. The conversion factor from daily average flow to monthly and yearly volume now seems to be the same for HydoBase database and web services.

Wind data issues are resolved

The issues with wind-related data tests failing have been resolved. Data issues must have been resolved.

Temperature issues are mostly resolved - need to confirm documentation

Updates to the HydroBase have resolved a number of data issues with temperature data but it would be great to confirm the final result of work. The main confusion seems to be about monthly mean temperatures. Doug Stenzel provided the following examples of temperature stations:

Here are 2 temperature stations

USC00051401 (CASTLE ROCK)
Min/Max only 1893 - 2022

BRL02 (BURLINGTON SOUTH (#2), 6 MI SE BURLINGTON)   
Min/Max from 1992 - 2022 
Mean from 2015-2022
*Monthly data would use Min/Max up to 2015 and then Mean from then on.

For BRL02 I think he meant Min/Max from 1992 - 2014, and then Mean from 1992 - 2022. I put together a TSTool command file for this station to understand daily mean data, which resulted in the following graph (note the database mean and calculated mean are slightly different on the right side):

image

Based on some other testing, it seems that HydroBase contains daily mean temperature only if it was provided by the data source. For monthly mean time series, such as MeanTemp-Avg.Month, MeanTemp-Min.Month, and MeanTemp-Max.Month, I have the following questions:

I have released TSTool 14.2.2 with the above cleaned up tests and documentation and am moving on. The State can clarify if they have the time and energy to do so. I'll leave this open.