axiomatic-systems / Bento4

Full-featured MP4 format, MPEG DASH, HLS, CMAF SDK and tools
http://www.bento4.com
2k stars 482 forks source link

mp4fragment not fragmenting fixed I-frame interval properly #236

Open Daiz opened 6 years ago

Daiz commented 6 years ago

So I'm trying to use mp4fragment and mp4dash to generate fMP4 HLS streams with four video variants (360p, 480p, 720p, 1080p), but I'm running into an issue that ends up with ERROR: video tracks are not aligned.

All the streams have been encoded with I-frames at fixed 192 frame intervals - I use a qpfile with x264 to ensure this, and I've verified that they are indeed in place in all variants with ffprobe. The problem is that for better quality I still allow the encoder to insert additional keyframes within those 192 frame segments where necessary, and this seems to cause issues for mp4fragment. Instead of proper constant 192 frame/sample fragments, mp4fragment (with --verbosity 3) gives slightly varying fragment lengths, like so:

...
fragment: track ID 1
 192 samples
 constant sample duration: no
fragment: track ID 1
 191 samples
 constant sample duration: no
fragment: track ID 1
 191 samples
 constant sample duration: no
fragment: track ID 1
 194 samples
 constant sample duration: no
fragment: track ID 1
 192 samples
 constant sample duration: no
...

This happens with basically all the variants. On top of that, mp4dash seems to have issues with the fragment duration too (which is 8 seconds at 192 frames at 23.976 fps), as it reports auto-detected fragment duration too large, using default when parsing/extracting the video tracks.

Would it be possible to specify fragment duration in frames with mp4fragment? Shouldn't mp4dash allow 192 sample fragments? mp4hls with 8 second segment duration works with my source files just fine, but it doesn't support generating (f)MP4 files, so right now I'm kinda stuck here...

barbibulle commented 6 years ago

When mp4fragment emits the waring auto-detected fragment duration too large, it is an indication that it couldn't find a regular I-frame interval in the file. You can check exactly where your I-frames are in the file by using: mp4info --show-samples <video-file>. You should see an '[S]' marker for all sync samples with an marker next to it to indicate an I-frame. If you see frames marked but not [S], those would be non-sync I frames. Some encoders generate those, but since they're not 'sync' frames, they can't be used as the first frame of a fragment.

Daiz commented 6 years ago

mp4fragment is not what's emitting the auto-detected fragment duration too large, it's mp4dash - though I suspect the reasons are the same.

Anyway, I re-ran my test for regular 192-frame I-frame intervals on the output of mp4info --show-samples for all my video variants (you can find said output for the variants here), and the results came out as expected - all the I-frames at regular 192 frame intervals are sync I-frames. Now, there are additional sync I-frames as well (as per encoding while allowing additional I-frames in-between the 192 intervals for better overall quality, as mentioned), which I guess is what's messing mp4fragment up, and why I'd really like to be able to explicitly tell it only use the I-frames at 192 frame intervals, since it really shouldn't be an issue in itself to have additional I-frames in the middle of fragments...

Though weirdly enough, mp4fragment does say found regular I-frame interval: 192 frames (at 23.976 frames per second), but then when you look at the --verbosity 3 output, you can see that it's clearly messing it up anyway with inconsistent sample counts for fragments:

...
fragment: track ID 1
 192 samples
 constant sample duration: no
fragment: track ID 1
 191 samples
 constant sample duration: no
fragment: track ID 1
 193 samples
 constant sample duration: no
fragment: track ID 1
 189 samples
 constant sample duration: no
fragment: track ID 1
 195 samples
 constant sample duration: no
fragment: track ID 1
 192 samples
 constant sample duration: no
...
Daiz commented 6 years ago

Looking into the logs further I'm realizing that x264 with a qpfile doesn't really work like I thought it would, and as result of using both a qpfile and a max keyint of 192 there's a lot of bits like this where you have two I-frames in a row (where 4801 is at the regular 192 interval), so I'm not too surprised mp4fragment is getting slightly confused:

[004800] size= 73066 duration=   834 [S] <I>
[004801] size= 60304 duration=   834 [S] <I>

Still, I'd expect it to be able to do consistent 192 frame fragments even if a fragment would end on an I-frame, as it could always happen on occasions with the qpfile method even if I increase max keyint.

Daiz commented 6 years ago

Tried re-encoding my files with --keyint infinite (since the qpfile enforces IDR frames at regular 192 frame intervals) and as expected mp4fragment still runs into issues with not getting a constant 192 sample fragment duration due to additional keyframes landing near the 192 interval ones.

barbibulle commented 6 years ago

I think the root of the problem here is that while you do have I-frames are regular intervals in terms of frame count, the frame durations are not constant (in the example you link to, it varies between 834 and 835), so the interval isn't constant in terms of time. Since mp4fragment uses a fragment target duration expressed as a time value, it won't exactly align on the frame-based interval, which isn't a problem if the target frame boundary is close enough for the jitter to be negligible, but it there happens to be another I-frame very close, it might be closer to the time boundary than the one you'd want. In general, I wouldn't recommend having a file with non-constant frame durations. It is always a source of potential trouble. With MP4, the frame durations are expressed in terms of the media timescale unit, which you can choose arbitrarily. In the example you link to, the media timescale unit chosen is 20,000 (meaning 20,000 units per second), so with a frame rate of 23.976 (which is a common frame rate), that leads to a duration of 834.1675.., which isn't a integer, so the encoder that produced the file has to alternate between 834 and 835 to keep the average at 834.1675.. Choosing a media timescale of 23976 in that case would give you a frame duration of exactly 1000, which would make everything work smoothly.

One thing I will do for sure in the next release is to add some detection logic in mp4fragment to issue a warning if it detects a file with non-constant frame durations. It may be possible to also add support for specifying the fragment target duration in terms of number of frames, but that's always tricky, because, for one thing, that only applies to the video track (and the file may contain other tracks, so for non-video tracks it would have to revert to a time-based fragmentation logic)

Daiz commented 6 years ago

I suspected those uneven durations could be causing the issues... and after a bit of fiddling (ie. muxing the raw H.264 video streams with mp4mux) I managed to remux the video in a way that allows me to get constant sample durations, and with it mp4fragment actually manages to do proper 192 frame splits on the files, subsequently enabling mp4dash to function as well. Thanks!

Daiz commented 5 years ago

Here I thought this issue was fully behind me... but I seem to be running into it again. So far I've been working with purely 23.976 FPS footage, but for a change, there's something that's actually running at 29.970. No biggie, I just make sure timescale is set to 29970 for the MP4s and all the frames get a consistent duration of 1000. But now mp4fragment is producing fragments of uneven length again:

found regular I-frame interval: 192 frames (at 29.970 frames per second)
fragment: track ID 1 [fragment 001 - should start on frame 00001]
 192 samples
 constant sample duration: yes
fragment: track ID 1 [fragment 002 - should start on frame 00193]
 192 samples
 constant sample duration: yes

...

fragment: track ID 1 [fragment 120 - should start on frame 23041]
 192 samples
 constant sample duration: yes
fragment: track ID 1 [fragment 121 - should start on frame 23233]
 190 samples
 constant sample duration: yes
fragment: track ID 1 [fragment 122 - should start on frame 23425, but starts on 23423]
 194 samples
 constant sample duration: yes
fragment: track ID 1 [fragment 123 - should start on frame 23617]
 192 samples
 constant sample duration: yes

Output of mp4info --show-samples on the source MP4 file I'm trying to fragment:

Track 1:
  media:
    sample count: 57325
    timescale:    29970
    duration:     57325000 (media timescale units)
  frame rate (computed): 29.970

...

[023040] size=  2895 duration=  1000     <P>
[023041] size= 19826 duration=  1000 [S] <I>
[023042] size= 10243 duration=  1000     <P>
...
[023232] size=  3279 duration=  1000     <P>
[023233] size= 25994 duration=  1000 [S] <I>
[023234] size=  1751 duration=  1000     <P>
...
[023422] size=  6329 duration=  1000     <I>
[023423] size=  6271 duration=  1000 [S] <I> << what seems to make mp4fragment mess up
[023424] size=  3622 duration=  1000     <P>
[023425] size=  6590 duration=  1000 [S] <I>
[023426] size=  4090 duration=  1000     <P>
...
[023616] size=  2057 duration=  1000     <P>
[023617] size= 32699 duration=  1000 [S] <I>
[023618] size=  1731 duration=  1000     <P>

As before, I've been encoding these with infinite keyint and allowing x264 to insert keyframes where it deems fit while also enforcing keyframes at every 192 frames, and so far everything has worked out just fine with lots and lots of 23.976 FPS footage, yet for some reason a nearby keyframe seems to be messing mp4fragment up at 29.970 FPS, even though it even says the whole found regular I-frame interval: 192 frames (at 29.970 frames per second) thing... Any ideas that might help here to get mp4fragment to just make those regular interval 192 frame fragments that it even detects just fine that also preferably doesn't involve re-encoding all the video?

(On an extra note, I accidentally muxed the exact same video at 23.976 FPS initially which worked just fine with mp4fragment and mp4dash... expect obviously the video and audio were completely out of sync. Then I fixed things to make it mux at 29.970 FPS and now it's breaking as described, even though the keyframe positions and sample durations are exactly the same - only the framerate and timescale are different.)

And to reiterate as to why exactly this is a problem, it's because I'm trying to mux multiple resolution variants (360p, 480p, 720p, 1080p) together with mp4dash, but due to this uneven fragmentation issue for the variants I keep getting ERROR: video tracks are not aligned and not being able to mux things together as a result.

barbibulle commented 5 years ago

Would it be possible to share a link to the file so that we can try and replicate the issue locally and see what's going on?

tab1293 commented 4 years ago

One thing I will do for sure in the next release is to add some detection logic in mp4fragment to issue a warning if it detects a file with non-constant frame durations.

@barbibulle have you made any progress implementing detection logic for MP4 containers with non constant frame durations? I am looking for a programmatic method to determine if a particular video stream needs to be re-encoded in order to support fragmentation for DASH packaging.

barbibulle commented 4 years ago

@tab1293 mp4fragment already sort-of warns you when it can't find a regular repeating frame interval. Ex:

mp4fragment input.mp4 output.mp4
found regular I-frame interval: 1238 frames (at 23.976 frames per second)
auto-detected fragment duration too large, using default

In this example, the only interval detected was a very long interval (by chance), which was deemed too large, so it printed a warning.

Are you looking for something different from that?

tab1293 commented 4 years ago

@barbibulle I see that warning on most fragmentation runs. Even when I re-encode the video stream with FFmpeg using -x264-params 'keyint=${keyInt}:min-keyint=${keyInt}:no-scenecut', that warning message is still printed. Although I see the warning, mp4dash is still able to successfully package the fragmented file and the DASH stream plays fine with RxPlayer (seeking is OK too).

I am trying to determine if and when a re-encoding of the video stream is necessary in order to guarantee that mp4fragment and mp4dash will output a playable DASH stream. My assumption was that a non fixed I-frame interval would break playback, specifically around seeking capabilities. Maybe my assumption is wrong?

Any insight around this topic would be appreciated. I am trying to avoid re-encoding an H264 stream when it is not necessary. Thanks.