Open keighrim opened 1 year ago
Another issue with the current usage of timeUnit
raised by @owencking over s slack chat.
Owen King 15:57 Sometimes you use frames. Other times it's milliseconds. How do you choose?
Keigh Rim 15:57 arbitrarily. up to developer’s decision.
(...)
Owen King 16:01 Suppose I want to ensure that we can return to and reproduce that exact frame repeatedly. Does it matter whether we identify it by frame number or millisecond?
Keigh Rim 16:06 Well, as long as the video file is the same, I don’t believe the math will be different.
(...)
Owen King 16:10 Well, I was wondering if different software might do the math differently. Frame number should be unambiguous. But maybe for millisecond some will choose the frame before and some will chose the frame after?
(...)
Owen King 17:08 I can tell you what was on my mind. Over at GBH we're talking about whether it's better to save image files of successfully identified frames of interest, or just to keep the timestamps for the relevant frames (as in MMIF annotations). I was trying to figure out if there were downsides to not keeping any image files and always assuming we can extract the relevant stills on an as-needed basis. Just trying to think through the pros and cons of the two approaches.
To me, the primary sources of the problem here is that
Currently, MMIF doesn't specify nor validate the "number"-types values in discussion. For example, even when a dev say the unit for the TimeFrame
is seconds
, it could still have any fraction below the decimal point. For reference, ISO standard doesn't specify the limit below the DP, but python implementation will use microsecond (not milli). Provided the worst case, where the developer rounded-up or rounded-down all seconds to ones and made them all natural (not "rounding", but "ceiling" or "flooring"), we should expect up to 29 frame difference between actually frame number and the second number converted back to a frame number.
Different software actually do the math differently. In addition to that computers are really bad at doing math the fractional numbers precisely (to be fair, they are really, really, really good at doing it very quickly). For reference, for "29.97" NTSC-based videos, ffmpeg will tell you the FPS is 29.97
, while opencv pushes to to the limit of the double FP and gives you 29.97002997002997
. This difference will result in up to 13 frames different given a ~4-hr long 29.97-fps video (or 240 60 30 total frames), or up to 4 frames difference in 1-hr videos.
$ ffmpeg -i cpb-aacip-77-074tnfhr.mp4 2>&1 | grep -i fps
Stream #0:0[0x1](und): Video: h264 (Main) (avc1 / 0x31637661), none(progressive), 480x360, 659 kb/s, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 30k tbn (default)
$ python -c 'import cv2; print(cv2.VideoCapture("cpb-aacip-77-074tnfhr.mp4").get(cv2.CAP_PROP_FPS))' 2>/dev/null
# (...)
29.97002997002997
$ python -c 'a=29.97; b=29.97002997002997; print(set([abs(int(i * a) - int(i * b)) for i in range(240*60*30) if int(i * a) != int(i * b)]))'
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13}
The second point is less of a problem, and we can probably live with 3-4 frame differences for 1-hr videos it. We can also somewhat mitigate this by forcing rounded-to-hundredths fps (consistently 29.97) in the SDK code (which I already did last week in the develop branch), hoping the devs are lazy enough not to implement their own fps calculation.
The real problem is the first one, where the developer can choose to use any number below the DP, and the MMIF spec doesn't validate it. As Owen pointed out, frame numbers are unambiguous and concise. But we can't force the usage of frame nums because timeUnit
is not just for videos, but also for audios (and alignment of the two, as in forced aligner apps). A few options I can think of to mitigate this;
seconds
and always use milliseconds
or microseconds
as integers. This will reduce inaccuracy coming from FP calculation, as well as the conversion. (with 29.97 fps, duration of 1 frame is ~3337 microseconds. This is a huge tolerance compared to what we have now with usage of seconds.)In any case, I strong suggest we should drop using seconds
either because of its coarse (or even crude) granularity or because of its unrestricted decimal power.
@kelleyl , @snewman-aa you guys have plenty experience with video-related CLAMS apps, and I'd like to hear your thoughts on this.
I see no reason not to drop seconds
and I'm not even sure why we would need to continue supporting milliseconds
either. I think I might prefer the second option of only using frame num to encode times. A lot of apps use models that ultimately have frame
num output that we then convert to milliseconds
in the MMIF output only to then convert them back to frames for the model in the next app.
I think this unit conversion may have been the number one source of bugs in my apps so far, so I definitely wouldn't mind removing it all together, though the conversion utils are still convenient for development.
Today I Learned: frame count numbers are not unambiguous. There is nb_frames
, and then nb_read_frames
in the MPEG world, and opencv
either uses a possibly inaccurate nb_frames
value or just estimates using fps * duration math.
https://stackoverflow.com/questions/31472155
The second proposal works better for videos, but I don't think there's a universal way to convert frame nums to audio-compatible timecode in reliable way.
The more I'm thinking about this, the more I'm leaning toward using milli- or microsecond as the only timeunit, and make all the time-related values typed as int
. A few advantages;
start
and end
values in annotation.property
are then single-typed (charter offsets and time points), and we can atually validate them in the json schema (although, this means we will have a "breaking" new version). HH:MM:SS.fff
, which is not a number, and hence will add some burden to converter code pieces to parse some string values into numbers before going into the math (but I admit that timecodes might be the most intuitive notation for human readers).A (small) problem with using microseconds as the only timeunit: 1 microsecond is a millionth of a second, meaning 1 hour is 3,600,000,000 microsecond. Since the maximum "simple" signed int on 32-bit systems is 2^31 - 1 = 2,147,483,648, on these systems simple integer data type only works only up to ~35 min long media (~70 min with unsigned int).
This is indeed a very "small" problem considering;
int
data can be arbitrarily long integers. We (@jarumihooi @owencking and me) talked about this issue in yesterday's meeting, and we (in general) agreed to use either integer milli or micro-seconds as THE standard time unit to appear in all MMIF output of future versions of CLAMS apps. I just wonder if that decision would cause any trouble in terms of GBH's downstream processing workflow (pbcore conversion, relational db operations, etc), so summoning @mrharpo @afred @foo4thought .
Another "feature" that can be helpful for human readers of MMIF is having altStart
and altEnd
fields along with start
/end
to hold secondary, more human-readable format. Having alternative anchors could also benefit other anchors types, for example, holding unicode codepoint counts for text data. Discussion around alt
-anchors would probably desire its own issue.
At the moment, it seems we've reached a consensus of going forward with all gold (and mmif??) datasets using the ISO 8601 standard hr:mn:sc.msc
with a DOT.
I forgot to add the reasoning about audio not having frames to the aapb-annotations/readme.md but otherwise I think this could be closed until further business needs require revisiting this or more precision digits.
Note, while a decision was made, raws and golds currently do not follow the convention yet, as that requires changing both tools and dataset for compatibility.
I'll keep this open until we
mmif-python
library altStart
/altEnd
props, or that topic is open as a new issueFor data in the aapb-annotations
repo, we should probably open new issues to keep track of required updates and updating progress of process.py
and golds
files.
Currently do all the outputs from apps to mmif record this in the ISO Time Format?
I believe that is the higher concern on the issue as it relates to older aapb-annotations
repo projects.
Into the future, if its an in-house anno tool, it should use this format where possible. if an out-of-house annotation tool or it is not possible, then process.py
s should convert the time format when converting to gold. (Is it worth it to convert the raw as well before it is stored in the aapb-annotations repo? Simply to reduce uniqueness?
Apps should then all expect ISO Time format and output that way too.
As for the previous projects, reconverting the time format is likely a lot of work and may cause troubles with apps or currently working code to fail. How do we consider if this is worth doing or not?
Ideally, all the "currently" working evaluation code should pull the gold data by permalinks. So unless we re-write aabp-annotations
git history entirely, they should work in the future, regardless how we change the format of existing (old) project files.
That said, obviously none of current clams apps uses ISO format for time notation, since officially, only valid time units are seconds, ms, frame numbers (https://mmif.clams.ai/vocabulary/Region/v1/), and the specification hasn't been updated based on the above discussion.
What I'm not very sure about implementing this is the altStart
and altEnd
fields. Since start
and end
fields in MMIF have a very special meaning and they are used not just for time-based annotations, but all sorts of other annotation types, allowing alt
fields with another special meaning only for time-related annotation types just doesn't make a lot of sense to me. And giving different special meanings to other start
and end
(e.g., character offsets for text-based annotations) will require quite long discussion that we can't afford now.
Another aspect of the problem we need to consider is that MMIF is NOT INTENDED FOR HUMAN READERS. For human readers, we have mmif-viz. Hence, if we need some "human-friendly feature" such as ISO time notation, probably mmif-viz is the right place to implement it, not within mmit-python
SDK.
So going back to the other questions,
Apps should then all expect ISO Time format and output that way too.
I don't think so. App should all expect integer-valued start
and end
in milliseconds in the future, and should output integers are the same granularity. But they can output the ISO way along with the integers. Apps can use altStart
and altEnd
to output ISO format, but that must be entirely optional. Thus, apps must not expect or rely on ISO format especially not in start
/end
fields.
if its an in-house anno tool, it should use this format where possible. ...
process.py
s should convert the time format when converting to gold.
Totally agree that any future in-house annotation environment (for human annotators) must use ISO time format. Not quite sure about the second part, though. Again, gold files are mostly for machine consumption. We can add ISO format as human-friendly columns next to machine-friendly start/end columns in the "gold" format. And writing that conversion (just once) in the process.py
shouldn't be that hard. On the other hand, omitting machine-friendly columns and having all the downstream software that want to use the gold data to convert ISO strings to actual numbers on their own seems almost sinful decision to me.
Is it worth it to convert the raw as well before it is stored in the aapb-annotations repo?
I guess this will completely depends on how the annotation tool, and "upload" steps are implemented. If you're referring existing raw files from older project, I wouldn't bother re-formatting them.
As for the previous projects, reconverting the time format is likely a lot of work and may cause troubles with apps or currently working code to fail. How do we consider if this is worth doing or not?
Now I'm not sure in this part you were referring to "raw" or "gold" files of older project. However, it will be worth re-formatting "gold" files because we will probably re-using them in the future over and over (old versions of "gold" files are still accessible via git history and github permalinks, so no worry about losing anything). So having those files up-to-date with what we decide should be worth the effort, as long as the gold data can be used for any purpose (evaluation, mostly).
Just for future reference, all "gold" files for older annotation project (2022-23 season) are now updated to use the ISO time format via https://github.com/clamsproject/aapb-annotations/pull/77.
Currently, we allow
seconds
,milliseconds
, andframes
as possible values oftimeUnit
metadata property. We might want to re-visit this metadata property so thatISO8601
or othertimecode
formats.timeUnit
in the JSON map (by enumerating all possible values explicitly in the vocabulary, instead of having a genericstring
type