Open jminor opened 6 years ago
TIL that the Python % operator and the C % operator are not the same thing. So I withdraw my flag of a bug in the timecode conversion routine.
Creating a unit test to expose issues sounds like a great first step.
Here is some excellent background reading.
https://randomascii.wordpress.com/2012/02/13/dont-store-that-in-a-float/
The conclusion is that double precision will store a time value with constant precision at the microsecond level for 136 years.
So what's left is to work out is at what frame does precision break down for what frame rate. Since NTSC rates are irrational (30000/1001 == 29.9700299700299700...) several values should be checked beyond the usual integral multiples of 24, 25, and 30 - at least 25000/1001, 30000/1001, 60000/1001, and 120000/1001.
Since float operations potentially lose a little significance after every operation, the next question is: at the point precision breaks down, how much significance is lost for every operation?
https://benjaminjurke.com/content/articles/2015/loss-of-significance-in-floating-point-computations/
It might turn out that double is good enough, and it's kind of feeling like it, but at least we would know definitively.
Float and int aren't the only options. The time implementation I'm working on (Slowly Sorry!) internally stores values as a 64 bit integer and a 8 bit comma positioning value. This means i can do mixing of 24/30/29.97 FPS with perfect precision, and no loss. I think its in my Github branch (Ive been busy with other stuff and learning Python so things have slowed down)
I still thing the time description we have is the right one, we just need to not use floats. This requires a proper implementation of the json parsing that doesn't convert it in to floats. It makes sense to me that you can put down the frame rate as you understand it and have the implementation do the right thing, even if it means a lot of work on the inside of that implementation.
One potential (or partial) solution could be #214
See also: https://nomore2398.com for background on drop frame timecode.
This came up in discussion again recently - specifically around the question of how much error is present from using a double-precision float instead of int/int for rate.
Could someone check my rough calculation here:
rate = 30000./1001.
seconds = 30 * rate
frames = seconds / rate
frames # => 30.000000000000004 does not equal 30.
error = frames - 30
error * 60*60*24*365*100000 # => 0.0112...
So if you had media with a running time of 100,000 years it would be off by 0.01 frames.
@jminor your calculation is ballpark in line with my microsecond at 100 years seat of the pants earlier in the thread. I am reading your analysis as a vote against the need for int/int.
The anecdotes and evidence I have gathered basically boil down to variations of an argument along the lines of "I had a timeline over a day long, and a couple of hours into the second day, a frame was lost, when using a double based time representation". This isn't because of insufficient resolution, it's because of frame snapping, and heuristics around frame snapping.
The int/int float/float argument always comes down to the fact that one must, in the end, snap a time to a frame for display. Snapping a floating point value to a frame is an irreversible operation, which means you can't go from a frame to the source time ever again. The argument for int/int reflects the idea that assuming reasonable quantization, (e.g. from a 24 fps timeline back to a 24 fps source, or to a 30fps source via a pull up), the snapping heuristic is avoided and therefore the operation is reversible, because an algorithm is known to do it. This argument does not hold up in a modern context, because the mix of framerates is no longer as simple as it was when it was dictated by projection and broadcast standards. Chris' flicks proposal is partially meant to be paradisiacal to show the absurdity of an integer base. The argument that the existence of pull up and pull down and drop frame and so on is a good solution for reversibility doesn't hold water in the face of modern arbitrary frame rates coming from any number of advanced imaging technologies.
I propose that the current representation is actually technically sufficient to our needs, and that the missing piece is that we have not specified how temporal sampling and reconstruction works such that given a temporally heterogeneous set of composed clips (clips superimposed on a timeline whose rates all differ) how to precisely explain which sample index (image number in a image sequence, frame number in a mov) corresponds a given temporal coordinate on the timeline.
int/int still comes up as an opinion in our Slack channel. Why don't we add an agenda item for this issue to the next TSC meeting, with the intent of opening the floor to anyone who's got a compelling reason for int/int, and recording any discussion here. If a compelling int/int argument does not arise, we could then proceed to close this issue with a note on the decision, as no compelling argument for int/int has been posted on this thread in four years.
Edit follows:
As of recent (autumn 2021) I am in the int/int camp. The reason is that analysis of any given time value does not show issues under float/float as noted throughout this thread, but IEEE float math is not reversible, whereas carefully constructed int/int math is. Simply put, under IEEE float math b
is not guaranteed to be equal to a * b / a
. Any number of other theoretically idempotent constructions can be constructed where equality is violated. So whereas double/double seems in principal fine for a file format, per this thread, it doesn't hold up for a library that expects to do math with the values.
I wanted to add to this conversation following an discussion; the issue of reversability is of particular interest.
For example, about a year ago, I filed #830 in which otioconvert
had produced output irreversable by otiostat
. We cared less about whether "23.976023976023978" or "23.976023976023979" was a more accurate femtosecond-level representation of a specific rational number than whether we could take those strings and understand them to be approximations of 24000/1001.
We'd love to use OTIO as an agnostic and open format usable by a wide spectrum of upstream providers. Within these upstream sources and our downstream consumers, relevant tools treat framerate as a rational number. OTIO's requirement for it to be rasterized as a floating-point string is a point of friction and potential error. If someone issued us an schema-valid OTIO rendered with "23.976024627685547" (24000/1001 as a binary32 variable), we'd get an INVALID_TIMECODE_RATE
thrown through its absence from https://github.com/PixarAnimationStudios/OpenTimelineIO/blob/466a9b51aeaab2aae20b46620eff6ec49c9106ba/src/opentime/rationalTime.cpp#L41, even though it's "more accurate" than the 23.976 listed there.
As a result, we can't rely in production on heuristic string matches for something as fundamental to our ecosystem as framerate representation.
@msheby That specific bit of code is problematic in not being parameterized with an epsilon. Irrespective of the float representation problems in discussion in this issue, would adding an epsilon to that check be of use immediately? Also, good catch on that bit of code!
I happened to run across the WebCodec W3C standard which, interestingly, uses unsigned long long (microseconds) for durations, and doubles for framerate: https://w3c.github.io/webcodecs/#dom-videoencoderconfig-framerate
It makes sense in the context of the codec's practical calculations, although I am surprised :)
framerate, of type [double](https://webidl.spec.whatwg.org/#idl-double)
The expected frame rate in frames per second, if known. This value, along with the frame [timestamp](https://w3c.github.io/webcodecs/#dom-videoframe-timestamp), should be used by the video encoder to calculate the optimal byte length for each encoded frame. Additionally, the value should be considered a target deadline for outputting encoding chunks when [latencyMode](https://w3c.github.io/webcodecs/#dom-videoencoderconfig-latencymode) is set to [realtime](https://w3c.github.io/webcodecs/#dom-latencymode-realtime).
Oh I ran into this Int vs Float thing in the AVFoundation conversion code, and would absolutely 100% get behind an Int/Int representation to avoid floating point errors.
A hardy +1 from me please!
FWIW, here are some unit tests I wrote which attempt to resolve some similar issues on the CoreMedia vs OTIO conversions (if its remotely helpful)
Currently RationalTime is represented as a float value and a float rate. Unfortunately there are cases where floating point arithmetic leads to precision issues, rounding and undesirable drift.
Some other systems represent rate as an int/int rational number. For example 30000/1001 vs 29.97.
We would like to experiment with this alternate representation of rate for OTIO. In order to do so, we propose two concrete steps:
By doing this we can ensure that we won't break anything by making this switch and we can clearly explain to everyone (ourselves included) the difference between the two and the reasons for choosing one over the other.
Comments?