Closed ajtritt closed 6 years ago
doesn't this also necessitate a units field and expanding the allowed dtype?
"since this will explicitly specify the offset to obtain relative timestamps, eliminating the need for APIs to guess based on range." can you clarify this point?
instead of adding a new field, wouldn't it be more simple to deprecate session_start_time
and replace it with timestamps_reference_time
as an optional top-level field? however, if people want to use POSIX timestamps (the use case that motivated this), this still necessitates expanding the dtype to include int64 & adding a units field for timestamps, no?
Fwiw, I think that every dimension should have a required unit spec, including time
@neuromusic timestamps already has a unit field. Right now it is sent to the constant 'Seconds'. That would need to be changed to support this proposal. We would need to allow ints as well.
Getting rid of session_start_time
does not preserve backwards compatibility. By adding this field, we preserve the theme of having timestamps relative to a global start.
Regarding my statement about API's guessing.... storing the offset provides a mechanism for explicitly specifying what stored timestamps mean. I am trying to strike the balance of keeping relative timestamps as default, but allowing users to specifying absolute time. By specifying timestamps_offset
, users can store absolute time, which they may want to do queries or analyses on. However, this breaks the concept of storing timestamps relative to a start time. By storing timestamps_offset
, a user is effectively saying "subtract this number from all timestamps to get timestamps relative to the global start time". By explicitly storing this number, we avoid the situation where a users or API has to inspect timestamps to determine which frame they are in. Under the rules I have laid out, It will always be the case that timestamps - _timestampsoffset will always produce timestamps that are relative to session_start_time
.
"Under the rules I have laid out, It will always be the case that timestamps - timestamps_offset will always produce timestamps that are relative to session_start_time
."
Currently, timestamps will always produce timestamps that are relative to session_start_time
. This proposal adds the overhead of an additional field to resolve this data.
We are already in the middle backwards compatibility breaking changes to the schema, no? ElectrodeTable, etc. I'm happy to break backwards compatibility for increased clarity and generalization.
And to clarify, all timestamps are relative, always. The need here is not absolute timestamps per se but timestamps relative to an explicit session-agnostic date.
session_start_time
may be off from timestamp[0] by seconds or minutes or hoursmy updated proposal:
session_start_time
timestamps_reference_time
as an optional top-level field, which all timestamps in the file are referenced tothis has the added benefit that a user using POSIX timestamps does not need to compute the delta between Jan 1970 and session_start_time
simply to write a file, as they would need to in your proposal.... they can just drop the POSIX reference time into the relevant field and put an experimentally-relevant value into session_start_time
I am a little unclear about what you are proposing. Can you tell me how one would store posix timestamps using this scheme?
session_start_time = None
timestamps_reference_time = None
# then, for a given timeseries...
timestamps = <array of float64>
timestamps.dtype = float64
timestamps.unit = 'second'
session_start_time = None
timestamps_reference_time = '1970-01-01T00:00:00Z'
# then for a given TimeSeries....
timestamps = <an array of int64>
timestamps.datatype = int64
timestamps.unit = 'POSIX64'
session_start_time = '2017-08-31T10:40:01-07:00'
timestamps_reference_time = '2017-08-07T00:00:00Z'
# then for a given TimeSeries....
timestamps = <an array of float64>
timestamps.datatype = float64
timestamps.unit = 'second'
in this case, even though we designated that the session started on 8/31 using local time, all timestamps are relative to GMT at an earlier date (perhaps the animal's birth date? or first day of training? or some other experimentally-relevant time that the experimenter wants to reference their timestamps to)
In the current format, time encodings that are not relative to the global clock would be stored in the sync
group of TimeSeries
http://nwb-schema.readthedocs.io/en/latest/format.html#id94 . As such, I think it might make sense to define these structures within sync
. In this way TimeSeries
itself would not change (i.e.., always have relative times) and a specific spec would be added to the sync
folder to describe absolute times. To make the sync
group easier to extend it would probably be useful to add a neurodata_type: LabTime
group that people can extend to describe lab-specific time encodings.
@oruebel I don't understand. sync
addresses a totally separate issue of tracking provenance information related to synchronizing timestamps between timeseries acquired from different hardware. AFTER sync'ing you need to put everything in the same time reference, which what this issue is concerned with.
in my proposal, all timestamps are still relative to the global clock, we're just being more explicit about the the global clock's reference.
@neuromusic Thanks for clarifying. So then session_start_time
and timestamps_reference_time
are both optional, and there are three rules for how they get interpreted?
yes, both optional.
three rules?
Yeah, the three scenarios you listed and their meanings
rule 1: session_start_time
is metadata. while machine-readable, the specification offers no guidance on how it should be interpreted beyond what the experimenter describes, however, by convention, this should be the datetime of the experiment.
rule 2: timestamps_reference_time
is interpreted as the ISO8601 time where timestamp=0.
rule 3: if timestamps_refernece_time
is empty, then the experiment has no ISO8601 reference & its timestamps can only be referenced internally to the NWB file
n.b. I keep saying "file" but I know we want to break free from HDF5 eventually... should I say "container"? "session"?
What I like about Andrews proposal is that it does not unnecessarily break backwards compatibility. If a user does not specify timestamps_offset
then all times are relative to session_start_time
as they have always been. It only adds additional functionality without breaking existing functionality.
Refining the specification of session_start_time
in terms of defining exactly how it should be stored is I think in general helpful and has been raised in #50
After thinking about this and doing some math, these two proposals are one and the same in terms of how they get used. Here's my explanation for why I think that...
To explain this, let me define some short hands I will use below.
abs
:= absolute timestamps
rel
:= relative timestamps
ts_off
= timestamps_offset
trt
:= timestamps_reference_time
sst
:= session_start_time
ts
:= timestamps
POSIX(xyz)
:= the posix time representation of an ISO8601 encoded timestamps
timestamps_offset
abs = ts + sst - ts_off
rel = ts - ts_off
If a user has saved absolute timestamps i.e. ts == abs
and ts_off == POSIX(sst)
, then
rel = ts - tsoff
If a users has saved relative timestamps i.e. ts == rel
and ts_off == 0
, then
abs = ts + sst
timestamps_reference_time
abs = ts + trt
rel = ts + trt - sst
If a users has saved absolute timestamps i.e. ts == abs
and trt == 1970-01-01T00:00:00Z
, then
rel = ts - POSIX(sst)
If a users has saved relative timestamps i.e. ts == rel
and trt=sst
, then
abs = ts + trt
With that said, the issue becomes one of human interpretability, so, I think the decision is up to those who will be interpreting.
@ajtritt that makes sense.
After looking at it a bit more, I think I like the timestamps_offset
approach better, mainly because the meaning/behavior of session_start_time
does not change compared to the current approach if ``timestamps_offset=0. In contrast, with
timestamps_reference_timewe change the name of the reference and, hence, require folks that are used to the concept of times relative to
session_start_timeto change to use
timestamps_reference_timeand set
timestamps_reference_time=session_start_time```
Either way, I think session_start_time
should remain a required field.
Regarding backwards compatibility, in the current schema, there is only an implicit link between session_start_time
and the reference time for timestamps: "Timestamps here have all been corrected to the common experiment master-clock. Time is stored as seconds and all timestamps are relative to experiment start time." It is not clear that session_start_time
is what is being referenced here. The implicit link between the two is simply a convention that I suspect is largely unused.
So the change I propose does not actually break backwards compatability but instead makes what has been implicit explicit.
The key disadvantage of Proposal 1 is that two values are necessary to represent "absolute" time (both session_start_time
and timestamps_offset
). Further, the offset must be computed. To represent POSIX time, this means that there is now a burden on the person creating the file to compute the time delta (in POSIX timestamps) between Jan 1 1970 and session_start_time
. e.g.
session_start_time = '2017-08-31T10:40:01-07:00'
timestamps_offset = <some computed int64 representing the delta between sst & Jan 1 1970>
# then for a given TimeSeries....
timestamps = <an array of int64>
timestamps.datatype = int64
timestamps.unit = 'POSIX64'
While both proposals represent the data, if I want to save my file with a different session_start_time
, then I need to recompute my timestamps_offset
.
Under both proposals, on reading, I can't assume that my timestamps are in POSIX64... I need to check what they are referenced to. Under proposal 1, I need to subtract (or add?) the timestamps_offset from session_start_time and check it against '1970-01-01T00:00:00Z'. Under Proposal 2, I just have to match the string saved in timestamps_reference_time
against '1970-01-01T00:00:00Z'
in general, NWB needs fewer required fields, but I'm willing to save that argument for another day
Just to add my final two cents, I don't feel at all strongly about this. I'm perfectly happy for my lab to effectively ignore session_start_time by setting it to '1970-01-01T00:00:00Z' and using the (implicit) assumption (which probably should be made explicit) that session start time is the reference time for all timestamps. But if adding a field is helpful to others, that's fine with me, and I don't have a preference as to how it is done.
My final comment is that I think that if any of us were presented with code that implemented either solution, we would quickly adjust to it and then not think much more about it. As soon as the Technical Advisory Board for NWB is fully formed they can deal with these issues, but until then my suggestion would be for Andrew and Oliver to make a decision that makes sense to them and to move on so we can keep the development going.
Time is stored as seconds and all timestamps are relative to experiment start time." It is not clear that session_start_time is what is being referenced here. The implicit link between the two is simply a convention that I suspect is largely unused.
I understood it like that and also create NWB files following this convention. It was a quite a hassle to base all measurement data on this common time zero of experiment start.
how do I get on this technical advisory board?
Right now the plan is for Lydia to the the Allen representative (I think), so I'd suggest touching base with her or Christoph.
My understanding of the issue at play here is that there are two distinct temporal frames of reference under discussion: 1) RealTime (i.e. our shared experience of the world, and ignoring any effects from special relativity), and ExperimentTime. The “timestamps” field of the nwb file provides a coordinate system for ExperimentTime. The original spec requires that a required point, the beginning of the experiment (session_start_time), provide a common reference point to relate this ExperimentTime coordinate system (timestamps[0]) with RealTime, and there has been discussion on this thread about using either ISO_8601 or POSIX as the coordinate system for RealTime.
More controversy arises from the modeling community that claims that this RealTime/ExperimentTime relationship is not relevant for their data, and therefore want to make this session_start_time attribute optional. I agree with this.
Also, I want to point out that the units on the “timestamps“ field are only necessary if 1) there is no RealTime/ExperimentTime correspondence AND 2) there is not other common reference point between the RealTime coordinate system (either ISO_8601 or POSIX) and “timestamps”. This means that no units are required if both session_start_time and something like session_end_time are provided (and of course the value of timestamps[0] and timestamps[-1] coincide with these two fields).
My preference is to adopt ISO_8601 as the coordinate system for RealTime via session_start_time, but make its inclusion optional, as well as a new optional session_stop_time. Also, I would be fine making the “timestamps” unit optional, however if it is not specified then both session_start_time and session_stop_time must be provided. This is a backwards compatible change that increases flexibility for building the files, while ensuring that “timestamps” is always a unitful quantity.
If we cannot agree on a coordinate system for RealTime, then an additional piece of metadata that specifies either ISO_8601 or POSIX will be necessary, required any time session_start_time is required.
@nicain to clarify, the discussion on whether to use POSIX or ISO8601 for "RealTime" reference is #50.
This issue was prompted by the feature request by @lfrank (now retracted, I guess?) to support POSIX values (instead of seconds relative to experiment start) on the "ExperimentTime" timestamps themselves.
@ajtritt you are absolutely right that the two proposals are the same wrt to the data they represent. in fact, if I were implementing this in a python class, I would hide the implementation in hidden attributes and expose BOTH approaches as properties of the class, use the math you outlined to query either, and have setters and getters that either compute the timestamp_offset
if timestamps_reference_time
is passed or compute timestamps_reference_time
if timestamp_offset
is passed.
so practically, pynwb should be able to both proposals, regardless of what is decided here w/r/t to the schema.
so this discussion is about the
session_start_time
as a the reference to map "experiment time" to "real world time"from the perspective of someone who needs to write these files and coerce my data into the spec, I find the explicit approach of Proposal 2 to be far more intuitive. Further, Proposal 1 only meets needs 1 & 2.
again, we've already made a decision to break backward compatibility with the schema, hence the "2.0" designation. if there is ever an appropriate time to break more things in the pursuit of clarity and improvement, this is it.
With NWB 2.x spec being finalized, and now that solid ISO8601 support is in pynwb ( via https://github.com/NeurodataWithoutBorders/pynwb/pull/641 ), it seems like a good time to try to reach a decision on this issue!
I would like to throw my support to @neuromusic 's proposal ("proposal 2") for an additional, optional timestamps_reference_time
field that encodes the timestamp zero time as an ISO8601 string. (As opposed to "Proposal 1": an optional field representing an offset (in seconds) that is subtracted from the session_start_time
to obtain the reference time).
I believe we can achieve backwards-compatibility by declaring that: "if timestamps_reference_time
is not specified, then the timestamp reference time is taken to be the session_start_time
".
I (quite strongly) prefer the specification of a reference time (rather than an offset from session_start_time) as being simpler to use and more intuitive during both file write and read. For instance:
session_start_time
, file_create_date
, and timestamps_reference_time
can be provided by passing in a timezone-aware datetime object, which leads to clear and expressive code: (with my apologies to PEP8 for alignment!).# For a file that uses POSIX timestamps
from datetime import datetime
from dateutil import tz
session_start_time = datetime(2017, 10, 31, 9, 59, 0, tzinfo=tz.gettz('US/Pacific'))
timestamps_reference_time = datetime(1970, 1, 1, 0, 0, 0, tzinfo=tz.tzutc())
file_create_date = datetime.now(tz.tzlocal())
session_start_time
should be made optional (e.g. for simulation data). I don't have any objection to this.@tjd2002 Proposal 2 from Andrew's post in https://github.com/NeurodataWithoutBorders/nwb-schema/issues/49#issuecomment-326391755?
@t-b Yes.
Proposal 1 = "Timestamps_offset" Proposal 2 = "Timestamps_reference_time"
I can live with both proposals and would also be fine with Proposal 2. ISO8601 strings to the rescue!
Okay, proposal 2 it is... timestamps_reference_time
as an ISO8601 string!
Any feelings on what timestamps_reference_time
defaults to? I'm guessing most people will just want to store relative (to session_start_time
) timestamps, so timestamps_reference_time
can default to session_start_time
.
@t-b @tjd2002 thoughts?
Okay, proposal 2 it is.
I agree.
timestamps_reference_time
can default tosession_start_time
.
Yes, based on the discussion in https://github.com/NeurodataWithoutBorders/nwb-schema/issues/50
Great that we seem to have a consensus!
timestamps_reference_time
can default tosession_start_time
.
This behavior at the API level sounds works for us and affords a level of backwards compatibility.
Am I correct that this means that timestamps_reference_time
will become a required field in the spec, with a default value set by the API at write time? For those who wanted to make session_start_time
optional, this might be unwelcome news (though it's not a lot worse for them than the current situation).
@tjd2002 Just a FYI. According to http://neurodatawithoutborders.github.io/development_plan this monday (11/05/2018) is the cut off date for schema changes.
Thanks @t-b! I will take this on and make my best effort to have a PR ready for review on Monday.
Sent from my phone
On Nov 3, 2018, at 7:41 AM, Thomas Braun notifications@github.com wrote:
@tjd2002 Just a FYI. According to http://neurodatawithoutborders.github.io/development_plan this monday (11/05/2018) is the cut off date for schema changes.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Users would like to store timestamps in both absolute and relative time. To achieve this, the following modification is proposed:
Proposal Add an additional top-level field called _timestampsoffset - the time difference by which timestamps are offset from being relative to the file's global starting time i.e.
session_start_time
. This value is 0 by default i.e. timestamps are directly relative tosession_start_time
. If this value is not 0, then to obtain timestamps that are directly relative tosession_start_time
, one must subtract this offset.Advantages
This allows users to store timestamps in absolute POSIX time without having to set
session_start_time
to "00:00:00 1 January 1970" i.e. the unix epoch. Saving timestamps as absolute POSIX timestamps can be done by settingtimestamps_offset
to the POSIX timestamp equivalent ofsession_start_time
. Saving timestamps as relative tosession_start_time
(the current specification) can be achieved by settingtimestamps_offset
to 0.This also makes the specification easier to develop against, since this will explicitly specify the offset to obtain relative timestamps, eliminating the need for APIs to guess based on range.
Disadvantages
Modifications