PecanProject / bety

Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
https://www.betydb.org
BSD 3-Clause "New" or "Revised" License
16 stars 38 forks source link

Add a mandatory time-offset field to the forms on the New and Edit Traits pages. #356

Open gsrohde opened 8 years ago

gsrohde commented 8 years ago

The user will be required to choose a time-zone offset when entering a trait that includes date and/or time information. The time will be stored as UTC time--the date and time stored will be calculated from the date and time entered in the form and the offset entered.

Requiring the user to explicitly choose a time-zone offset will help prevent the user from blithely assuming the time entered will accurately represent the time specified in the data source (e.g., the citation).

[A more comprehensive fix would be to add a timezone_name column to the sites table. These would be daylight-savings-agnostic names such as 'America/Chicago', which corresponds to an offset of -05:00:00 in the summer and -06:00:00 in the winter: If we have a site_id--for example, 76 for "EBI Energy farm" in Urbana--and a timestamp (without time zone) value--say, for example, 2015-08-17 14:31:26 corresponding to local time at the that site, then we could update the column date of type timestamp without time zone in the traits table in a row with id 22116 with the following UPDATE statement:

UPDATE traits t
    SET date = CAST(CONCAT('2015-08-17 14:31:26', ' ', (SELECT timezone_name FROM sites s WHERE s.id = t.site_id)) AS timestamp with time zone) AT TIME ZONE 'UTC'
WHERE t.id = 22116;

After this, the value of column date for this row will be 2015-08-17 19:31:26, the UTC time corresponding to local time '2015-08-17 14:31:26' at the site.

Something like this code could be used to process the forms on the New and Edit Traits pages, though even in this case, we might wish to require the user to verify that the time he/she entered was local time at the site. See related issue #348 for how to populate the sites.timezone_name column.]

gsrohde commented 8 years ago

@dlebauer I forgot that we haven't agreed on which date and time fields to use. This complicates this issue. Right now the Web app (excepting the bulk upload wizard) only touches and displays the five split-out columns (date_year, date_month, date_day, time_hour, and time_minute) and leaves the date and time columns alone (new rows set these two columns to null). See issue #357. I could implement the timezone field for now using the split-out columns, but I'd rather not get into the logic of figuring out if and how the date should change when the offset is applied if I don't have to since this could so much more easily be done using a bona fide datetime (timestamp) database type.

I guess I'll start by working on the front-end stuff and worry about the machinery later.

gsrohde commented 8 years ago

@dlebauer , @robkooper

I've put a draft "proof-of-concept" implementation of a new trait-editing interface on pecandev. In addition to adding a timezone offset field to the New and Edit forms, I've revised the way dateloc and timeloc values are handled and how dates and times are edited and displayed. The guiding principle was that the user should see and edit what the data represents rather than the somewhat arbitrary database representations used for storing that information. (But I might have made some false assumptions about what exactly dateloc and timeloc mean--see below.)

The implementation also changes to using only the date column--it ignores the time, date_year, date_month, date_day, time_hour, and time_minute columns. See Github issue #357.

A few notes:

  1. I've removed the dateloc and timeloc fields from the Viewing Trait and New Trait pages. Instead, when creating a new trait, these are computed automatically according to which date/time fields are filled in and which are left blank. For example, if only a year is selected, a dateloc value of 8 is stored; if only a month is selected, a dateloc value of 96 is stored. On the "Viewing" page, the date value of "2001-01-01 00:00:00" would display as "2001", "Winter 2001", "January 2001", or "2001 Jan 1" according to whether the dateloc value is 8, 7, 6, or 5. (See date-timeloc_values_and_timestamp_values.xlsx for the schema I have used for representing date/time information using a combination of the date, dateloc, and timeloc attributes. I may possibly have deviated from this schema somewhat. In particular, I didn't implement the time-of-day timeloc value.)
  2. I've left the dateloc and timeloc fields on the edit page for now but plan to remove them eventually. They are ignored when submitting the form--only the date and time fields are used when deciding what dateloc and timeloc values to store. But I left these fields on the edit page for now so that you can easily see what dateloc and timeloc result from various choices of date and time field values.
  3. Validations: A timezone offset must be selected unless the hour is left blank. If all time fields are blank, a time offset of zero is used in computing the date. (Conversely, an offset shouldn't be allowed if the hour is blank, but this hasn't been implemented yet.) If a site having location information is given for the trait, and if a timezone offset is selected, it is checked that it is within a few hours of what one would expect based on the site longitude. However, an offset of zero is allowed no matter what the site longitude in case a user wishes to enter the time in UTC instead of the time at site.
  4. Times for existing trait rows are always shown in UTC since we don't store any time zone information. This is another reason to allow an offset of zero in the Edit Trait form: we shouldn't require the user to convert the time back to local time before doing an update. (My preference would be to always use site-local time instead whenever a trait with site information is being displayed. This could easily be done by adding a timezone attribute to the sites table. These should be daylight-savings-time agnostic names like 'America/Chicago', not abbreviations such as 'CST' or 'CDT' or the numerical offsets themselves, which are not constant for a given site but vary with the time of year.)

Additional comments:

It wasn't until I was nearly done implementing this draft that it occurred to me that I was equating "Level of Confidence" with "precision". So perhaps I have gone off on the wrong track. But the way in which these attributes are describe in the Data Entry document suggest that perhaps "precision" is a more accurate description for how these attributes are used and that "Level of Confidence" is a bit a a misnomer here.

Even if "precision" is the more apt interpretation, there is another sense in which I may not have captured the intended use of these attributes. For example, the Data Entry document defines dateloc 7 as "season", but perhaps I took this too literally and the meaning should be more something like "give or take a month or two". While "2001-05-07 give or take a month or two" is roughly equivalent to "Spring" (perhaps with an implied likelihood that the actual date was closer to 2001-05-07 than to 2001-03-21 or 2001-06-21), perhaps we should be allowed to express something like "2001-06-21 give or take a month or two" which translates into something closer to "Late Spring or Early Summer".

(See also my discussion of the dateloc and timeloc attributes in items 6 and 7 at the top of Github issue #347.)

dlebauer commented 8 years ago
dlebauer commented 8 years ago
Date                                     Time
   year   month/season    day               hour    min
robkooper commented 8 years ago

@PecanProject/pecan-developers how does this fit with the discussions we had about time in PEcAN?

gsrohde commented 8 years ago

@dlebauer @robkooper I've put a slightly revised draft version of the traits pages on Pecandev.

This version no longer has a time-offset selector; instead, it always displays and edits using site time, or, if the site time zone is unknown, then it uses UTC. (Presence of a site is now validated, so for newly-entered traits this case only arises when site.time_zone for the trait site is NULL.) Dates are always stored in UTC, however, as before.

A few messy issues I haven't dealt with... Most of these have to do with time-offset issues that potentially could result in the stored time being an hour off. Even if we ignore these issues, I think we are still better off compared to the current situation, where in many cases it is not clear what offsets were used.

  1. If a user uses an indeterminate date (any dateloc value other than 5—in other words, the year or the day and possibly the month is missing), then the time offset to use can't be determined definitively, since it can't be determined whether or not Daylight Savings Time is in effect. What actually happens is based on date resulting from the "special value" numbers used for filling in missing values (e.g., 9996-04-01 for "Spring" of an unspecified year. Recall the user never actually sees this date but, in this example, would see only "Spring" instead; the exception is that data returned using the API will only contain the raw column values).
  2. The PostGIS world timezone data that I used for populating the new time_zone column of the traits table was just something I got off the Internet somewhere. But it seemed a reasonable place to start. But see next item.
  3. I have been assuming that a time zone can be definitively determined from a latitude and longitude, and that PostgreSQL (or the system data it uses) knows enough about possibly-evolving Daylight Savings Time rules that given a time zone and a date/time value, it knows exactly what offset to use. But I now realize what a rash assumption this was. Time zones weren't even widely used until the 1800's, and in the 20th century, time zone boundaries have not remained constant. Consider the following:
  4. Interesting corner case: The time zone offset for zone 'America/New_York' is currently 5 hours in the winter time. But before noon on November 18, 1883 it was only 4 hours, 56 minutes, and 2 seconds. So noon actually occurred twice that day. So if you try to enter the date/time "1883-11-18 12:00" for some trait in time zone 'America/New_York', the Rails app will reject it. (12:04 only occurred once, so "1883-11-18 12:04" can be entered.) On the other hand, ambiguous times arising during the chane-over to Daylight time can be entered. See next item.
  5. Certain date/times are ambiguous when only displayed with Daylight-Savings-Time-agnostic time zone names. For example, "2015 Nov 1 01:30 (America/New_York)" could mean either 5:30 or 6:30 UTC. (The current implementation stores 5:30 UTC if you enter this date/time.)
  6. As far as I can tell, we never allowed timeloc value of 1 (second) to be entered through the Web interface, and there isn't a seconds field, so the value for seconds can only be 00. If you edit an existing trait having timeloc = 1, it will automatically get changed to 2, assuming you don't touch the date or time values.
  7. I haven't implemented the "time-of-day" timeloc value.

This list is probably not exhaustive. Please try out the draft version on Pecandev and see if you find other issues. Let me know which, if any, of these issues (or others you may find) are worth addressing.

dlebauer commented 8 years ago

@gsrohde Lets wait for @robkooper and @mdietze to give feedback on the demo at pecandev.igb.illinois.edu/bety

1) .... data returned using the API will only contain the raw column values

Can we

3, 4) changing time zones

good question. I think for practical purposes it should be sufficient to document the definition of time zones that we use. I am not aware of any cases where precision better than an hour would be necessary during the 1800's. If it is, then whomever is entering data can convert to the modern equivalent.

gsrohde commented 8 years ago

@dlebauer When you say "let's wait", do you mean don't do any more work on this issue, e.g., wait to do the time-of-day implementation?

Regarding the API, you should be able to return any view of the data you want (and I'm using "view" here in a general sense—I don't mean SQL view).

dlebauer commented 8 years ago

I mean wait for feedback from others before deploying this.

On Thu, Oct 29, 2015 at 4:06 PM, Scott Rohde notifications@github.com wrote:

@dlebauer https://github.com/dlebauer When you say "let's wait", do you mean don't do any more work on this issue, e.g., wait to do the time-of-day implementation?

Regarding the API, you should be able to return any view of the data you want (and I'm using "view" here in a general sense—I don't mean SQL view).

— Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/356#issuecomment-152326881.

mdietze commented 8 years ago

I agree that the proposed plan seems considerably better than the current default. None of the cases of concern look like things that will be particularly common or problematic.

One point of clarification, however, is how do you handle non-point sites that span multiple time zones? I'm assuming/hoping these default to UTC?

gsrohde commented 8 years ago

Re non-point sites—yes, that’s the plan.

From: Michael Dietze notifications@github.com<mailto:notifications@github.com> Reply-To: PecanProject/bety reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, October 30, 2015 at 4:31 PM To: PecanProject/bety bety@noreply.github.com<mailto:bety@noreply.github.com> Cc: Scott Rohde srohde@illinois.edu<mailto:srohde@illinois.edu> Subject: Re: [bety] Add a mandatory time-offset field to the forms on the New and Edit Traits pages. (#356)

I agree that the proposed plan seems considerably better than the current default. None of the cases of concern look like things that will be particularly common or problematic.

One point of clarification, however, is how do you handle non-point sites that span multiple time zones? I'm assuming/hoping these default to UTC?

— Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_PecanProject_bety_issues_356-23issuecomment-2D152656006&d=BQMCaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=oJsSpSeia9tFOpzHxC6EpYUew62qA42qdvjRrEjQywI&m=6vehQ_sB-pjj2v2iuQV3rrQeSp6tD9xAijFJXAatsoA&s=RadSbvRJ968iJnSzhVOweEdMddpqKCzOUUGl-S_lTCw&e=.

gsrohde commented 8 years ago

@dlebauer @robkooper @mdietze

I've updated the draft version of the Traits pages on Pecandev. This version has a time-of-day selection option (morning, mid-day, afternoon, night). Please take a look and take if for a test run if you get the chance.

A few notes:

  1. I realized there is a problem with using named seasons: There may be some ambiguity if the site is in an equatorial region or in the southern hemisphere. I'm not sure what the best solution is. (One possible solution: Always use and display Northern Hemisphere seasons, and be explicit in the display—that is, instead of showing a season as, say "Spring", show it as "Spring (in the Northern Hemisphere)".
  2. If you try to edit an existing trait where the stored date, dateloc, and timeloc values are inconsistent (vis-à-vis our new conventions for representing incomplete date/time values), you will get the "We're sorry, but something went wrong" page. This problem will go away once I convert all existing trait date/time information to use the correct representations.
ankurdesai commented 8 years ago

The standard for seasons in the climate community is based on months:

DJF MAM JJA SON

These are recognized acronyms for doing seasonal analysis.


Ankur R Desai, Associate Professor University of Wisconsin - Madison, Atmospheric and Oceanic Sciences http://flux.aos.wisc.edu http://flux.aos.wisc.edu/ desai@aos.wisc.edu mailto:desai@aos.wisc.edu O: +1-608-520-0305 / M: +1-608-218-4208

On Nov 5, 2015, at 5:15 PM, Scott Rohde notifications@github.com wrote:

@dlebauer https://github.com/dlebauer @robkooper https://github.com/robkooper @mdietze https://github.com/mdietze I've updated the draft version of the Traits pages on Pecandev. This version has a time-of-day selection option (morning, mid-day, afternoon, night). Please take a look and take if for a test run if you get the chance.

A few notes:

I realized there is a problem with using named seasons: There may be some ambiguity if the site is in an equatorial region or in the southern hemisphere. I'm not sure what the best solution is. (One possible solution: Always use and display Northern Hemisphere seasons, and be explicit in the display—that is, instead of showing a season as, say "Spring", show it as "Spring (in the Northern Hemisphere)". If you try to edit an existing trait where the stored date, dateloc, and timeloc values are inconsistent (vis-à-vis our new conventions for representing incomplete date/time values), you will get the "We're sorry, but something went wrong" page. This problem will go away once I convert all existing trait date/time information to use the correct representations. — Reply to this email directly or view it on GitHub https://github.com/PecanProject/bety/issues/356#issuecomment-154226327.

gsrohde commented 8 years ago

@dlebauer @robbenyang @mdietze

If no one objects, I'll change displays "Spring", "Summer", "Autumn", and "Winter" to "Season: MAM", "Season: JJA", "Season: SON", and "Season: DJF", respectively.

mdietze commented 8 years ago

Sounds good

dlebauer commented 8 years ago

+1

gsrohde commented 8 years ago

OK, I've changed the season display on Pecandev.

Also, I updated the Advanced Search page to display LOC-aware dates. Note that I'm now converting dates to the site timezone (instead of to UTC) and including display of the time zone; even though times aren't displayed in this table, there may be cases where the timezone used for display (UTC, server timezone, or site timezone) may affect what day is displayed—or in some cases, even the month or the year.

To do:

The 5.5 dateloc value is used fairly frequently. Other fractional values are used less frequently:

ebi_production=# select dateloc, count(*) from traits group by dateloc order by dateloc;
 dateloc | count 
---------+-------
    0.00 |  2195
    1.00 |    99
    4.00 |  1532
    5.00 |  5532
    5.10 |     2
    5.20 |    53
    5.50 |  2273
    6.00 |  4060
    6.50 |   608
    7.00 |  1106
    7.50 |   130
    8.00 |  4398
    9.00 |   540
   96.00 |     4
         |  9885
(15 rows)

ebi_production=# select dateloc, count(*) from yields group by dateloc order by dateloc;
 dateloc | count 
---------+-------
    4.00 |     3
    5.00 |  3976
    5.50 |   431
    6.00 |  1456
    6.50 |    19
    7.00 |   101
    7.50 |     1
    8.00 |  2062
    9.00 |    75
   95.00 |     2
   96.00 |    30
         |   452
(12 rows)
gsrohde commented 8 years ago

@dlebauer @robkooper @mdietze

Yields pages (and supporting code) are updated on Pecandev.

Also, I've changed the CSV output for traits, yields, and advanced-search results:

To sum up, I've added LOC-aware date/time columns to the generated CSV download files, so now there is a lot of redundant information in the these files. I mostly haven't yet tried to get rid of columns. But we probably should. Please advise!

Other than some tidying-up, I think this issue is mostly done, pending deployment. Cleanup of data/time data for yields is in a separate issue: https://github.com/PecanProject/bety-qaqc/issues/4. An issue also needs to be added for much thornier cleaning up of date/time data for traits.

dlebauer commented 8 years ago

If pretty_date and pretty_time show the level of precision indicated by LOC, is there any need to keep the raw_date or even the LOC-aware year,month,day,hour,minute fields?

gsrohde commented 8 years ago

Not that I know of. I'm happy to get rid of any redundant information we can.

Possibly it might be easier to do some programmatic computations on raw_date than it would be on pretty_date. I don't think I'm in a position to evaluate the merits of this since I know little about how downloaded data will be used.

Note, however, that when we defined the SQL view traits_and_yields_view, we included both a column for the trait or yield date and columns for the year and month extracted from that date. I'm not sure why this was done, but I'm sure I must have done this at your direction. So I thought there might be some reason for keeping the split-out components of dates and times.

gsrohde commented 8 years ago

@ankurdesai Is it common to say something like "DJF 2011"? And if so, would this refer to the season December 2011–February 2012 or the season December 2010–February 2011?

ankurdesai commented 8 years ago

@gsrohde Yes and it would mean Dec 2010-Feb 2011.