adlnet / xAPI-Spec

The xAPI Specification describes communication about learner activity and experiences between technologies.
https://adlnet.gov/projects/xapi/
899 stars 405 forks source link

Not rejecting extreme future timestamp statements to prevent issues due to clock errors is overkill #1072

Open garemoko opened 6 years ago

garemoko commented 6 years ago

https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Data.md#247-timestamp

A Learning Record Provider MUST NOT use a future time for a "timestamp" property in a Statement. An LRS SHOULD* NOT reject a timestamp for having a greater value than the current time, to prevent issues due to clock errors.

As a Learning Record Consumer, Watershed the Learning Analytics Platform sometimes has to handle statements with timestamps in the far off future. The LRC doesn't have any good options to deal with the data, since if we show it we're showing bad data and if we hide it we're potentially still showing bad data but it's just less obvious. Just removing the statements from reports rather than correcting the timestamp risks showing inaccurate information because you're missing the statement with whatever the timestamp should have been.

What we'd really like to do is return an error to the learning record provider when the statement comes in to the LRS, but the SHOULD* NOT requirement above strongly discourages us from doing that.

Clock errors are real, but should be limited in scale. I.e. I think we could agree that a clock difference of greater than 1 hour indicates a serious problem that deserves an error response. I therefore suggest that we change the above requirement to:

An LRS SHOULD* NOT reject a timestamp for having value up to one hour greater than the current time, to prevent issues due to clock errors.

And for clarity also add:

An LRS SHOULD* reject any statement with a timestamp more than one hour greater than the current time.

I'm open to using a different period than 1 hour (either longer or shorter) - the bad timestamps we are seeing are weeks, months and years into the future. I wouldn't want to go lower than 15 minutes.

fugu13 commented 6 years ago

Even with the star it's only a SHOULD NOT, so we shouldn't change it until a substantial version ticks (keeping this issue around and labeling with next version should be enough reminder).

I'm tentatively against naming a specific time window, and if we do it should be more than an hour, but I do think there's room for allowing bizarre future timestamps to be rejected. Or possibly the right reaction is auto-voiding? Worth thinking about, because then the data is preserved.

A future version might want to consider allowing an LRS to set timestamps in the future to the current timestamp as well.

garemoko commented 6 years ago

@fugu13 What use case do you have in mind for auto-voiding or allowing future timestamps?

My thinking is that (aside from the clock-drift scenario) this is always bad data and should be rejected with an error so that the LRP can fix it.

brianjmiller commented 6 years ago

I agree that whatever is decided would have to go in a major release. I can understand the softening of the original SHOULD* NOT, and potentially flipping it to a MAY reject but going to a SHOULD* to reject forces you into a spot where you can never accept that kind of data and I don't think we have the foresight to know all use cases. Just cause we can't come up with one now, doesn't mean we won't have one in the future. Allowing rejection should be enough, then let the implementations dictate on whether that becomes the de facto standard. If all the implementations reject then the data should be sane.

Another workaround for Watershed right now (read: no spec changes) is to put in your own setting for at what point in the future you toggle between reporting on timestamp and just deciding to use stored instead. It may not be an intended purpose of stored but it effectively fits your use case.

I'm generally in favor of rejecting bad data before it hits the LRS, but forcing rejection of bad data based on values that the LRS can't really interpret seems to go against what has historically been done in the spec.

brianjmiller commented 6 years ago

The other thing that will have to be considered here is timestamp in SubStatements particularly in their future tense use case (ref: https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Data.md#substatements). A SubStatement being used to indicate the future might contain a timestamp to suggest a time by which that statement must occur, or could provide when it is expected to occur.

garemoko commented 6 years ago

Using the stored property isn't going to help us because a lot of the time that has no relationship to when the tracked event occurred. A lot of the time data is translated in from CSV files containing data about experiences happening over a long time period. Stored and timestamp can also vary significantly when data comes in from another LRS.

The LRS already rejects statements based on values, for example a scaled score of 2 would be rejected as not being between -1 and 1.

brianjmiller commented 6 years ago

Yep, stored is just one possible workaround, seemed worth mentioning.

The value outside of a range isn't an interpretation of the value as 2 is distinctly not in the range -1 to 1. A timestamp of "future" is to a degree not an interpretation, but specifically adding a range and because of clock drift it becomes more of an interpretation than does an integer comparison.

andyjohnson commented 6 years ago

If we do make a change to only an hour, I'd want to be sure that the Time Zone issue is also included as a part of whichever version this change occurs in. On the call, a day was suggested as the time if we weren't considering updating the spec for time zone discrepancies.

andyjohnson commented 6 years ago

Per the 2/14/18 call, @garemoko 's suggestion is good for a patch version, however, this issue would not force a patch and the group would still be targeting at least a minor patch for the next revision.

DataBeeGood commented 5 years ago

Still a newbie :) I am trying to understand the problem so please correct me... but systems are my thing.... I apologize for the wordiness in advance...

My understanding of the use of timestamp with activity statements is that the intent of xAPI is to be decidedly past tense in its structure... recording what has happened. I cannot see why any statements would be intentionally future dated... ever.

Moving from statement generation to the process of sharing statements and interoperability....

I believe we should provide clarity by specifying timestamp generated by the CREATE or POST system which owns the statement vs. an accessing system which is handling a local copy or may transmit a message to the statement owner... who may be able to revise the malformed original statement...

What about future dates?..... Sub-statements in xAPI seem to be able to handle speculation and expectations of future actions, but these are NOT the same as the xAPI Activity Statement itself. HTTP REST would interpret this as a snapshot recorded at the time of the Activity Statement.

The Activity statement cannot be in the future by definition, but it may have a sub-statement to inform us of other details which may be important in the analysis. The reference to time in the sub-statement should not be confused with a statement's timestamp.

Thanks in advance for any correction of anything I am not grasping...

garemoko commented 5 years ago

@DataBeeGood I think you've mostly grasped things. The issue and what I'm proposing is that yes we need to allow some tolerance for future timestamps due to precision issues (i.e. disagreements between systems as to when the past stops and the future begins). However that tolerance should not be infinite and it would be helpful if LRSs were required to reject statements with timestamps many years in the future.

It would be over restrictive to specify that the timestamp has to be the datetime of a particular system event though, since the timestamp is supposed to represent the date the learner did the thing. While the xapi data may be generated in real-time, it may also be generated and sent some time (sometimes years) after the learner did the thing. Records being translated to xAPI may not even be digital.

The stored property is the datetime of when the record was stored in the LRS.

Timestamps in Substatements are a separate thing.

DataBeeGood commented 5 years ago

Thanks for the feedback @garamoko!

2.1.3 GET Statements https://github.com/adlnet/xAPI-Spec/blob/master/xAPI-Communication.md#partthree

The LRS MUST include the header "X-Experience-API-Consistent-Through", in ISO 8601 combined date and time format, on all responses to Statements Resource requests, with a value of the timestamp for which all Statements that have or will have a "stored" property before that time are known with reasonable certainty to be available for retrieval. This time SHOULD take into account any temporary condition, such as excessive load, which might cause a delay in Statements becoming available for retrieval. It is expected that this will be a recent timestamp, even if there are no recently received Statements.>

So what does this MUST actually do?

garemoko commented 5 years ago

That's not related to the statement timestamp property in anyway. It's related to the statement stored property.

There are two statement properties that contain a value with the data type of ISO 8601 timestamp. One is called "timestamp" and one is called "stored". These should not be confused, even though they are both timestamps. This issue is about the property called "timestamp".

I started to write an explanation of what the requirement you quoted does require, but ended up repeating exactly what the requirement says. Are there any specific bits that are unclear? Perhaps you could ask any questions about that requirement in a new issue since it's not related to this issue.

thomasturrell commented 3 years ago

@garemoko did you consider simply altering the timestamp in the statement?

Future timestamps could simply be interpreted as the current timestamp.

I believe that as things stand the spec would allow an LRS to alter a future timestamp to the current timestamp.

2.3.1 Statement Immutability

Statements are immutable (they cannot be changed). The following are exceptions or areas not covered by this rule:

Potential or required assignments of properties during LRS processing ("id", "authority", "stored", "timestamp", "version").

However maybe I'm being too creative in my reading of the spec!

I believe that clock synchronisation errors are less likely with TLS between the provider and the LRS.

brianjmiller commented 3 years ago

I think you're being overly creative in your reading of the spec :-). The intent of that portion was to highlight that the LRS itself has to set the timestamp value to the current time when it is not present in the statement as provided by the LRP. As with the id, the other three are always set by the LRS (with the exception of a very specific, and defined, use case). So the timestamp is really only changeable from not present to present because it is required to be present, it isn't that a pre-defined value itself can be altered to a new one.