Dash-Industry-Forum / Guidelines-TimingModel

DASH-IF implementation guidelines: the DASH timing model
9 stars 1 forks source link

Clarify relationship of leap seconds to durations #32

Closed sandersaares closed 4 years ago

sandersaares commented 6 years ago

IOP 4.1 says:

A leap second is added to UTC every 18 months on average. A service provider should take into account the considerations in RFC 7164 [50].The MPD time does not track leap seconds. If these occur during a live service they may advance or retard the media against the real time.

The statement "the MPD time does not track leap seconds" is a bit confusing there. I assume by "track" it is meant that a leap second occurring is not signaled in the manifest. That is fine, of course. However, the role of leap seconds should still be clarified.

This is especially relevant if you anchor your live stream at the Unix epoch, for example. There are quite a few leap seconds in existence so over such a period this may have an interoperability impact.

Wall-clock timestamps in the MPD use UTC. Since UTC uses leap seconds, this means these timestamps include leap seconds. That is fine and simple.

However, there are also durations! What happens if you take a wall-clock timestamp T and add a duration D? I can see two possibilities here:

  1. Durations do not include leap seconds.
  2. Durations do include leap seconds.

What is more, I can find arguments to support both interpretations! This makes me wonder about what implementations really do (as you may guess, some dubious numbers triggered my looking into this in the first place).

At first glance, I would think "of course they include leap seconds - when measuring duration, leap seconds are no different from other seconds". And that makes sense from many viewpoints.

But do durations really come into being by something actually measuring elapsed time? Or are they the arithmetic difference of a start and end timestamp? This begins to matter because the vast majority of timestamp formats do not count leap seconds internally! For example, Unix timestamps do not - you simply have to add "leap seconds between 1970 and timestamp value" to any Unix timestamp to arrive at wall clock time! Any duration derived from subtracting two Unix timestamps that cross leap seconds will be too small by the number of crossed leap seconds!

Therefore, I request that IOP include additional guidance for leap second handling by specifying whether durations should include them or not.

~The implementations I have access to appear to ignore leap seconds in durations, apparently being a straight Unix timestamp difference. Based on this (tiny) sample size, I would recommend we define durations as not including any leap seconds, with the latter being added when durations are resolved to wall-clock time (so any T1+D=T2 arithmetic has to check T1 and T2 and add any leap seconds that came between).~

Edit: I actually do not have enough information to really form an opinion on which way it should be, so I cross out the last paragraph.

Edit 2: see detailed CR proposal in later comments.

sandersaares commented 6 years ago

After throwing this around in my head and prototyping a few presentations with different timing, I see the following as making the most sense to me:

I would view it in a positive light if IOP were to be amended in this spirit, especially with a few toward warning users of the dangers of Unix timestamps used without leap second compensation.

TobbeEdgeware commented 6 years ago

I think that a warning is definitely apt, but apart from the actual crossing of leap second time intervals, I don't see that we need to change anything since Unix timestamps are aligned minute by minute with UTC.

Most DASH implementations use SegmentNumber so there must be an exact average duration of the segments. This should translate into a fixed number of segments/day (or other time interval) and it should be the same for UTC and Unix time which I think it can.

The tricky point is the point where a leap second is introduced, and I think an IOP should concentrate on defining what the preferred behaviour should be given that an encoder typically has a fixed GoP duration, which may need to be changed.

sandersaares commented 6 years ago

First draft of CR is ready for review: http://files.groupspaces.com/dashif/files/2090031/CR-leapseconds-v1.docx

I feel this might be too abstract and would benefit from more explicit guidance on real world use cases. I request input on what exactly those use cases might be.

sandersaares commented 6 years ago

Second draft with better focus and a suggested algorithm for synthesizing a leap second aware clock: http://files.groupspaces.com/dashif/files/2092643/CR-leapseconds-v2.docx

Most DASH implementations use SegmentNumber so there must be an exact average duration of the segments. This should translate into a fixed number of segments/day (or other time interval) and it should be the same for UTC and Unix time which I think it can.

Leap seconds are as real as any other seconds, so a day containing a leap second does in fact contain slightly more segments than other days.

sandersaares commented 6 years ago

The Community Review continues but there has already been a far bit of active discussion, so I summarize the received feedback here. Broadly speaking, comments fell into the following categories.

Terminology and understandability

Even among people in the know, timing- and clock-related terminology results in confusion. Some points that came up in discussion:

Terminology in the text needs to be carefully reviewed and possibly some explanations and references added to avoid confusion on these and similar points.

Interoperability with nonconforming systems

Even though availability in the DASH standard is firmly rooted in UTC and real time, there exist implementations that instead use leap-second-unaware timelines (e.g. Unix time), as @TobbeEdgeware pointed out. What should be done about this? What happens if you mix leap-second-aware and leap-second-unaware implementations?

This topic has so far not resulted in active discussion and there is currently no clear indication whether anything meaningful can be done about this. At minimum, we need to consider it a fact that different timing models are used by current implementations, even if unintentionally.

Browser-based DASH clients seem to be most affected by this, as JavaScript Date objects are not leap second aware. It was mentioned in discussion that broadcast clients may be in a better state as broadcast systems have stronger roots in real time (they had to track real time even before DASH was a thing) but to my knowledge the timing behavior of broadcast DASH clients has not been experimentally explored.

Even if nothing else actionable comes up, it is worth describing the interactions between the two types of implementations in IOP (e.g. do clients go too far in the future or too far in the past in problematic situations?).

poolec commented 5 years ago

I've been considering the leap second problem on and off for a while, and again recently in connection with the work on low latency live in DVB and DASH-IF.

In low latency services, a one-second error in segment availability will eat into the client's buffer quite a bit and affect the reliability of the service so it's worth thinking about.

I have a proposal below that I think solves the issue much more simply than previous proposals because it has very little complexity for the client and can be introduced without making anything worse for any existing server/client.

But first, returning to the actual problem, I think there are 3 issues that need to be considered, some more serious than others:

  1. Ambiguity. Today, clients are generally not leap second aware and consequently MPDs for long-running streams tend to work around this, meaning UTC times in MPDs may not actually be UTC times in the most precise sense and a client considering them to be may encounter problems with real-world services, even before the "next" leap second occurs.
  2. Playback errors. Anyone watching a stream when a leap second occurs may encounter temporary problems due to erroneous availability time calculations.
  3. Erosion of client buffer. For most (all?) clients, availability time errors will occur after a leap second occurs and affect the stream from then on, for all new sessions.

In more detail:

  1. Although the spec says availabilityStartTime is a UTC time, if there's a leap second between the availabilityStartTime and now, almost all clients (possibly even all clients) will calculate segment availability times that are later than the actual segment availability because they will use 'standard' leap second unaware APIs. Because of this, I've seen packagers subtract a number of seconds from the availabilityStartTime so that it's correct from the point of view of a client that is not leap-second aware. It is hard to see all clients changing quickly so we need a solution that can cope with this situation but at the same time allow new clients to calculate availability correctly.
  2. If clients schedule segment requests against a UTC clock, they may have trouble at the time the leap second occurs. In a way, this is actually the least concerning issue because leap seconds occur at midnight and only every couple of years: there are probably other errors affecting streaming reliability with greater impact than this!
  3. This is the biggest issue for low latency streams as the loss of a second of segment availability is likely to be a significant proportion of the client buffer.

I think a good way to address this is by providing some simple additional information in the MPD that will allow clients that recognise it to calculate availability times precisely and which won't upset any existing clients. Doing that avoids the need for a client to independently acquire lists of leap second occurrences, or perform more complex date calculations.

How I'd see it working is with a new element something like this (assuming this would ultimately be done in MPEG; options in other namespaces also possible):

<TimeInformation availabilityStartTimeTAIOffset="34" publishTimeTAIOffset="37" nextOffsetChange="2020-01-01T00:00:00" postChangeTAIOffset="38"/>

Firstly, the new element explicitly states the offset from TAI (international atomic time) that has been used in the indicated availabilityStartTime and also in the MPD publishTime. With this information, the client can simply apply the difference between the two as a correction to the availabilityStartTime and from then on use standard leap second unaware APIs for calculating time differences or scheduling actions.

Secondly, the new element can indicate when the offset to TAI will next change (i.e. when the next leap second occurs) and what it will be at that point. A client can compare the current time with that next change time and use the appropriate offset. (We can reasonably assume that a source of time provided by a UTCTiming element is a true source of UTC and will reflect the leap second when it happens).

This approach has a number of advantages:

and a client that understands the TimeInformation element would handle both correctly. The first is the most correct but the second is more likely to be what's done today in order to handle existing clients.

Any comments on this?

I see it being best handled in MPEG meeting but would welcome any feedback here first.

sandersaares commented 5 years ago

This seems like a reasonable mechanism to enable the required information to be described to clients.

I propose a few trivial changes to the mechanism:

From an ecosystem interoperability point of view, the fact remains that regardless of mechanism, the service and client must speak the same language in terms of leap seconds. We need to specify in IOP guidelines what that language should be. I would be happy to have a SHALL there that states such <TimeInformation> must always be present for dynamic MPDs if MPEG makes it happen.

sandersaares commented 5 years ago

There is also the additional consideration that the UTC base offset from TAI (before any leap seconds ever took place) is 10. This can be confusing.

As an alternative proposal, I would consider naming it just @sumOfLeapSeconds and @nextSumOfLeapSeconds. Just a value adding up leap seconds between MPD@availabilityStartTime and now. This reduces the ambiguity, I think.

poolec commented 5 years ago

Good ideas which help simplify things - thanks.

So we'd effectively be saying that the information has to be 'right' at the time the MPD is published (hence MPD@publishTime is definitely real UTC) and that the other attributes indicate the leap second offset for availabilityStartTime relative to publication time.

I think perhaps names like @availabilityStartLeapOffset might be even more descriptive, though whatever the attribute were called, I think that it will only be with carefully worded semantics that it will be fully understandable.

I just checked the DASH schema to look for anything else that is expressed as an absolute date/time. There are only three occurrences in total: MPD@availabilityStartTime - covered by the proposal already MPD@publishTime - not required in the simplified case where an offset is used MPD@availabilityEndTime - can be ignored I think: firstly, it's not used in a dynamic MPD; secondly, I don't think errors of small numbers of seconds in the end time of a static MPD would be an issue if they occurred and the availabilityEndTime is likely to reflect an 'round number' in UTC time; finally, the most correct thing to do would be to interpret according to the client's system clock or UTCTiming element clock at the time and so no special treatment would be needed in any case.

So I think it is just @availabilityStartTime that needs to be addressed.

Here's a proposed definition for the attributes. Maybe we can improve it here and then write it up as a contribution for the next MPEG meeting.

@availabilityStartLeapOffset: the number of seconds applying at the time of MPD publication that a client would need to subtract from MPD@availabilityStartTime in order perform segment availability calculations without considering leap seconds further.

@nextAvailabilityStartLeapOffset: the number of seconds that will apply from the time of the next leap second (indicated by the TimeInformation@nextLeapChangeTime) that a client would need to subtract from MPD@availabilityStartTime in order to perform segment availability calculations without considering leap seconds further.

NOTE: In order to play dynamic MPDs correctly in cases where leap seconds have occurred since the time of MPD@availabilityStartTime, clients that perform segment availability calculations using time functions that assume a constant day length of 86400 seconds need to apply the relevant offset in order to calculate segment availability correctly. Clients should use TimeInformation@availabilityStartLeapOffset if the current time is before TimeInformation@nextLeapChangeTime and TimeInformation@nextAvailabilityStartLeapOffset otherwise.

NOTE: In the event that the leap offset has already been accounted for in the value of MPD@availabilityStartTime, TimeInformation@availabilityStartLeapOffset should still be present but set to zero. From this signalling, a client can understand that no further leap second correction is required.

sandersaares commented 5 years ago

I agree with your analysis and proposed naming. However, we need to broden the scope a bit since availability is not the only place where leap seconds are relevant. True, the root of the timing is MPD@availabilityStartTime but this value is also used for non-availability-related purposes (a more useful name might have been MPD@startTime but benefit of hindsight is what it is).

As such, describing this logic as part of availability calculations would be misleading. For example, MPD@availabilityStartTime plus period durations define when a period starts. This is also a place where leap seconds need to be accounted for. Possibly there might be more that do not immediately jump to my mind.

Some wording that focuses on MPD@availabilityStartTime without actually constraining it to actual availability time calculations would be better here. Maybe just replacing "segment availability calculations" with "timing calculations" would be sufficient? Nothing significantly better comes to me at the moment.

poolec commented 5 years ago

Makes sense. "Timing calculations" seems to cover it. I've tidied up the wording a little in a few places and now have this:

@availabilityStartLeapOffset: the number of seconds applying at the time of MPD publication that a client would need to subtract from MPD@availabilityStartTime in order perform timing calculations without further consideration of leap seconds.

@nextAvailabilityStartLeapOffset: the number of seconds that will apply from the time of the next leap second (indicated by the TimeInformation@nextLeapChangeTime) that a client would need to subtract from MPD@availabilityStartTime in order to perform timing calculations without further consideration of leap seconds.

NOTE: In order to play dynamic MPDs correctly in cases where leap seconds have occurred since the time of MPD@availabilityStartTime, clients that perform timing calculations using functions that assume a constant day length of 86400 seconds need to apply the relevant offset in order to operate correctly. Clients should use TimeInformation@availabilityStartLeapOffset if the current time is before TimeInformation@nextLeapChangeTime and TimeInformation@nextAvailabilityStartLeapOffset otherwise.

NOTE: In the event that the leap offset has already been accounted for in the value of MPD@availabilityStartTime, TimeInformation@availabilityStartLeapOffset should still be present but set to zero. From this signalling, a client can understand that no further leap second correction is required.

sandersaares commented 5 years ago

How do leap seconds interact with MPD anchors of format #t=posix:1234567890? Considering that Unix/POSIX time does not recognize leap seconds.

poolec commented 5 years ago

If the time specified in #t=posix:... is no older than the most recent leap second, then this will be accurate for a leap-second aware client that understands the new MPEG leap second signalling.

If the time specified is older than that, there would be an error of one or more seconds in the start or end time of the region of the live stream being referenced.

I think this is a corner case that is not worth worrying about since live streams rarely have a timeshift buffer that is long enough to span a leap second (unlike the gap between availabilityStartTime and 'now' which often spans several) and a one-second error in a playback start time for fragment URL-initiated seeks backwards into a live stream that happen only for a short period every 2-3 years feels like a low priority issue to consider!

sandersaares commented 4 years ago

I will attempt to reference the new text in DASH in our timing model chapter of implementation guidelines.