WSWCWaterDataExchange / WaDESchemav0.3

The proposed next iteration of the WaDE schema. Will include more geospatial information, support for NHD indexing, and data quality control indicators.
0 stars 1 forks source link

Suggested changes or additions to consider in the next WaDE update #2

Open amabdallah opened 7 years ago

amabdallah commented 7 years ago

Most of the following suggested changes or additions would make WaDE more consistent and easier to work with as far as communicating its time series data into WaterML formats and use its data as input into WaMDaM. The suggestions are from experience in working with the Observations Data Model (ODM1) http://his.cuahsi.org/documents/ODM1.1DesignSpecifications.pdf and studying ODM2: https://github.com/ODM2/ODM2. I also adopted many of the suggestions here already in the Water Management Data Model (WaMDaM) which consumes data from WaterML and WaDE among many other sources. https://github.com/WamdamProject/WaMDaM_Schema

I like the ODM2 changes because they came after feedback on ODM1 and long discussions within the ODM2 multi-disciplinary team.

I already discussed many of these suggestions with Sara but I'm including all of them as a reference with some elaboration. I understand that it might be too difficult to reflect some of these changes to all the databases and their data migration scripts.

1. Add water or irrigation years "StartMonth" field?

Colorado irrigation year starts in November and Utah Water Year starts in October. It would be good to have the start year especially if users want to aggregate data over regional areas. StartMonth values would be like "October", "November"

2. Change ValueType into MethodType?

ODM2 has changed the ValueType into MethodType in ODM2. WaMDaM adopted the concept of MethodType too. I like this term because it quickly tells the user what category the method is, like "measured", or "estimated". The other existing method's metadata would better explain how the measurement or estimate was done etc. Here is the current full list of MethodTypes value in both ODM1, ODM2, and WaMDaM. http://vocabulary.odm2.org/methodtype/

3. Change DataType into "variable" or "parameter", or "PrarameterType"?

Right now, ODM1 or WaterML uses "DataType" to refer to the concept of "aggregation statistic" (see point 4 below) which has values of "Average" or "Cumulative" while WaDE uses DataType to mean AVAILABILITY, SUPPLY, USE, etc. ODM2 (and possibly the next version of WaterML) has changed the concept or term of "DataType" into AggregationStatistic. WaMDaM adopted the new concept too. So its worth considering to revise "DataType which I suggest changing it to "parameter" or "variable". Such name would fit nicely with the idea to publish much of the WaDE data in WaterML. Of course each database may call the term whatever it works best within the its context, but consistency is good. It's totally fine to keep it as is, but its worth considering

4. Add three new fields for time series metadata

Consider this example: Water use in a polygon or site is 100 cfs per month. Does that mean an average monthly use or cumulative monthly use? In WaDE it makes sense to be cumulative. But when working with multiple data sources, it gets tricky to assume anything. So it should be like this: 100 cfs [cumulative] over [1] [month] Similarly, if the value is reported in Acre-ft, it would be like this: 100 acre-feet [cumulative] over [1] [month] The words in [ ] above have these technical names as in ODM1 and 2: AggregationStatisticCV: "A vocabulary for describing the calculated statistic associated with recorded observations. The aggregation statistic is calculated over the time aggregation interval associated with the recorded observation" Full list is here: http://vocabulary.odm2.org/aggregationstatistic/ So it would clearly say if the values are "cumulative" or "average" water delivery/use AggregationInterval: 1. In WaDE the interval value will most likely be [1] all the time. It's useful for "detailed" or raw data that is reported like every like 15 min (if any). AggregationIntervalUnit: aggregation unit (e.g., day ,month, year)

Note: ODM2 has fields for aggregation over x and y in space. But I think for WaDE, aggregation in time seems to be more important to capture.

5. Change the Data Category values from "Summary" to "Aggregated" and from "Detailed" to "Sites" (or "Specific", or "site-specific").

This suggestion is less priority but its worth considering if possible and it depends on how the update will be implemented.

The aggregation category is easier to understand because the data is aggregated over "space" which is the polygon and it is also aggregated over time "month" or "year". Also, "sites" is easier to understand that the data is about a specific location in space Since we are updating the schema and possibly the local databases,

wadewswc commented 7 years ago

Revisions to schema based on these suggestions:

  1. YearTypeStartDate is a new data element for any "year" type used by the agency. Default is a calendar year.
  2. This adjustment was made. MethodType is now a child of the Method data element.
  3. This was not made. Some of the WaDE datatypes are hardcoded into the extraction.
  4. These have been added to any data element that provides an aggregated amount value.
  5. This adjustment has been made to the XML schema, however, some table names remain the same to assist agencies that have already developed scripts for importing their data into WaDE tables (e.g. DETAIL_ALLOCATION, D_ALLOCATION_LOCATION, etc.)
amabdallah commented 7 years ago

Sounds great! thanks.

A few more semantic suggestions for more intuitive meanings. I understand that it might be time consuming to reflect the changes to the functions etc but its worth considering them now or later. The use of "Code" for mixed numbers and text is used in ODM1, and ODM2.

1. Change ReporID to ReportCode? The ID sounds more for numeric values while code could be be text, numeric, or mixed

2. Change ReportUnitID to ReportAreaCode or ReportSiteCode? The term unit can be confused with the metric unit and the ID sounds for numeric values. "Area" sounds more accurate which represents the subarea, HUC, or county. In this case, we might want to change the REPORTING_UNIT table name to REPORTING_AREA

3. Change ORGANIZATION_ID to ORGANIZATION_CODE? Similair to the above suggestions, The ID now seems to take character values and code would be more intuitive here

4. Change Start date and End Date to Start and End month? Start date and End Date values seem to take start and end months. The change would help indicate that the data is about dates within the same year as far as months. I'm not sure if there are other use cases to need a full date image

5. Add a look up unit to the water summary tables? Although the unit is hard coded now as acre-feet, it would be more consistent and easier for others in the future to follow if it is there as a column.

amabdallah commented 7 years ago

A suggestion based on earlier discussions
Add a new column to the LU_BENEFICIAL_USE table and call it something like this:
USGS_WaterUse_Category

Adding it would help in publishing a more standardized WaterML water use variables and tag them with controlled vocabulary in the CUAHSI system.

So it requires one time effort in mapping between each local LU_BENEFICIAL_USE in each state and the USGS category. I already have a spreadsheet of all the LU_BENEFICIAL_USE for six states. We can map them and send them to each state for their feedback and approval if possible.

If the USGS category is approved to each local term. Then the WaterML would still use both the local term plus the USGS term. If a WaterML user asks for data for a USGS term, it would return all the local terms registered with it. Therefore we still preserve the originality of states data as far as its local beneficial uses.

amabdallah commented 7 years ago

Actually, also add another column like this one or shorter if possible USGS_WaterUse_Category_Description

It's good to have the description of what the USGS_WaterUse_Category include