PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

Date and Time columns need cleanup and clarification in traits and yields tables #347

Open gsrohde opened 9 years ago

gsrohde commented 9 years ago

The traits table

  1. This has nine date/time-related columns, often containing conflicting information. They are date, time, date_year, date_month, date_day, time_hour, and time_minute plus the two related columns dateloc and timeloc.
  2. The Rails app no longer displays or edits either the date or time column values. For many rows, only the date column and not the date_year, date_month, or date_day columns contain any information about the date.
  3. On the other hand, the bulk upload uses the date column rather than the date_year, date_month, and date_day columns.
  4. In spite of date having data type timestamp, the "time-of-day" portion of this value seems largely to have been ignored: the minutes and seconds of the date value is always zero, and only two rows have an hour value other that 0, 5, or 6. (0, 5, and 6 correspond to the UTC hour value when it is midnight in time zone UTC, CDT, and CST, respectively.) Even when the value of the time column is something like "21:17:00", the time portion of the date column is usually something like "05:00:00".
  5. date_year often does not match extract year from date, even when they are both non-NULL. Often they differ by 1 (in either direction). Similarly for date_month and date_day.
  6. dateloc: The BETYdb Data Entry Workflow document lists meanings only for these values: 5, 6, 7, 8, 9, 95, 96, and 97, though it does mention that fractional value may be used for levels not precisely defined by one of the integers listed. The Rails interface dropdown offers in addition the value "4" with explanation "time of day, e.g. morning, afternoon", though it is not clear if there is any convention for designating which time of day other than by making a note in the notes field. It also offers "5.5" with explanation "week" and "6.5" and "7.5" with no explanatory text. Significantly, it does not provide any means of entering the values 95, 96, or 97. The values actually occurring in the database are 0, 1, 4, 5, 5.1, 5.2, 5.5, 6, 6.5, 7, 7.5, 8, 9, and 96. "0" and "1" clearly require correction (or we must say what they are to mean). "5.1" and "5.2" seem overly precise.
  7. timeloc: The BETYdb Data Entry Workflow document lists meanings only for these values: 1, 2, 3, 4, and 9. The Rails interface dropdown offers in addition the value "2.5" with the explanation "quarter-hour". It does not include the "4 - time-of-day" value. The values actually occurring in the database are 0, 1, 1.5, 2.5, 3, 6, and 9. "0", "1.5", and "6" clearly require correction (or we must say what they are to mean), and if we are to allow "2.5", it should be mentioned in the Data Entry document.
  8. There is no documentation (that I can find) about how time values are to be entered or interpreted vis-à-vis time zone and daylight-savings-time status--whether they are meant to be local time at the site where the trait was measured, BETYdb server time at the time when the trait was measured, or UTC time at the time when the trait was measured.

    The yields table

  9. This has five date-related columns: date, date_year, date_month, and date_day plus the related column dateloc. date_year, date_month, and date_day are "new" and so are currently always NULL.
  10. Unlike the date column in the traits table, yields.date has datatype date, meaning it contains no time-of-day information.
  11. As with traits, the bulk upload uses the date column rather than the date_year, date_month, and date_day columns.
  12. As with traits, the BETYdb Rails interface offers a dateloc value of "4" as an option, even though this is not an accepted value according to the The BETYdb Data Entry Workflow document. (However, it seems more appropriate to allow this for yields than for traits, since yields don't have time-related columns where this information could be stored.) The BETYdb Rails interface offers all the accepted values except 95, 96, and 97.
  13. The database currently contains only documented values for dateloc (except for "4", which is, however, offered as a choice by the BETYdb Rails interface.)
robkooper commented 9 years ago

Not sure why the traits and yields table have the date split in many sections. Discussion in the PEcAn meeting led us to the conclusion that the dates should be in UTC. This I think requires the following field to be set config.time_zone to UTC

gsrohde commented 9 years ago

@robkooper The motivation for this apparently was to have a way of handling the case where the year is not known (or is immaterial) but there is information about the month and/or day. My preference would have been to use a real SQL date type and use special values (e.g. 1000-09-16) perhaps in combination with a boolean flag field saying that the year is not significant to handle this use case. This gives us all the built-in functionality associated with bona fide SQL dates, though I'm not in a position to evaluate how important this is for our purposes. If you prefer this too and make a strong case for it, maybe we can get rid of the 3-column date and 2-column time representations and do things the SQL way.

If we go this route, this also begs the question of whether to continue having both a date and a time field or whether to combine the information into a single field. (There are likely cases where a time of day is given but a date is not.)

robkooper commented 9 years ago

I'd rather set the year to 0 or 9999, with the millennia runs mike keeps talking about 9999 might be a better value (have to check to make sure postgresql allows that).

gsrohde commented 9 years ago

It does.