androidx / media

Jetpack Media3 support libraries for media use cases, including ExoPlayer, an extensible media player for Android
https://developer.android.com/media/media3
Apache License 2.0
1.6k stars 377 forks source link

HLS: Support TTML/IMSC subtitles (in mp4 segments) #588

Open RicFlinn opened 1 year ago

RicFlinn commented 1 year ago

I'm trying to figure out a way to make time-based subtitles captured from a live DASH stream display correctly after converting the stream to HLS. I'm using ExoPlayer v2.18.6.

My app converts a live DASH stream into HLS. A/V works as expected, but the DASH stream includes TTML-formatted subtitle segments that do not display. The app stores A/V segments from the live DASH stream, beginning at a random time within the stream. Audio, video, and subtitle segments are stored. I then create a set of HLS playlists from these segments to play back the video.

The TTML subtitle segments contain "begin" and "end" times for the display text relative to the start of the live DASH stream. A typical subtitle segment includes something like:

<p begin="489:02:17.897" end="489:24:18.698" region="Region_187">
    <span style="Style_184_0">WHAT SHOULD I DO?</span>
</p>

When playing the recording as a DASH stream, the subtitles display correctly, even when played back as a static DASH stream; i.e. not a dynamic stream using clock timing relative to when the stream was originally played. This means the subtitle decoder must be getting the timing data from the A/V stream and displaying the subtitles at the correct time based on this. (Side note: I see other evidence that this is true, like ExoPlayer's timeline bar has 489:20:17 as a start time.)

When playing the segments as an HLS stream, however, this internal timing reference is apparently NOT used, and the subtitles do not display, because the playback start time is 000:00:00 and the first subtitle displays at some much later time (489 hours, in the above example).

If I manually adjust the times in the segments, I get mixed results on whether or not they display. For example, if the above subtitle is the first one at the beginning of the stream, I can adjust it to look like this:

<p begin="000:00:00.000" end="000:00:00.801" region="Region_187">
    <span style="Style_184_0">WHAT SHOULD I DO?</span>
</p>

In some cases the caption will display correctly, and others it may just blink on and back off again. My hunch is that I'm not adjusting the times correctly according to when they should be displayed based on the A/V timing. Assuming that the first subtitle should start at time 0 rarely works as desired.

I'm wondering if there is a way to extract the display time value from the MP4 data and use that as a reference start time, and then adjust the subtitle times by this value. I'm not sure what this value would be called or where to look for it; I don't know much about how the the A/V renderers operate. I considered using the DASH timing information (availability start time, current time, etc), but I don't think this will be accurate enough.

Here is an example of the and HLS stream (as extracted from a live DASH stream) with TTML subtitles: https://1drv.ms/u/s!Ar-5IRkcwwk9gddDSA3TSrBPLus2Sw?e=QhVpC4

icbaker commented 1 year ago

Thanks for the report, and for the nice self-contained repro example - that makes it much easier to investigate!

I think I've got an understanding of what's going on - but I'm not yet sure the best way to fix it.

Looking at https://datatracker.ietf.org/doc/html/draft-pantos-hls-rfc8216bis#appendix-A it seems that IMSC/TTML support was added to the HLS spec after the initial RFC was published, so it's possible that ExoPlayer just doesn't really support TTML in HLS very well at the moment.

Here's a summary of my understanding:

  1. Your media has MP4 segments with MP4 sample timestamps that start at roughly 780345657454 microseconds (and this matches across the video and text segments)
  2. The MP4 text segments contain TTML data inside each sample, which also has timing information in it (as you've highlighted above). These match roughly with the sample timestamps.
    • So each TTML cue has two timestamps: The MP4 sample timestamp, and the timestamp from the TTML data itself.
  3. When playing HLS streams, there's a component in ExoPlayer called TimestampAdjuster which (in simple terms) modifies the timestamps of the media to start at zero by subtracting the required offset, based on the first timestamp in the media.
    • This currently only operates on the MP4 sample timestamps - it doesn't touch the TTML timestamps.
  4. ExoPlayer considers the TTML timestamps to be 'absolute' and effectively ignores the outer timestamp when deciding whether a subtitle should be shown on screen. This is set here: https://github.com/androidx/media/blob/1.1.1/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/AtomParsers.java#L1143

Combining these, we end up only considering the TTML timestamp - and the subtitles never show up, because the player thinks they're due to be shown very far in the future.

I've made a hacky local change that changes the behaviour described in (4), by setting Format.subsampleOffsetUs so that the TTML timestamps have the 'offset' subtracted from them before deciding if they should be shown. This results in subtitles being shown on screen at the right time.

Some questions I haven't resolved (not expecting you to answer, just noting down where I got to):

RicFlinn commented 1 year ago

Thank you for looking into this, Mr. Baker. I believe your assessment is correct.

You ask good questions too; I'm not sure what the expected behavior of a live HLS stream with TTML subtitles would be, but it seems like the case I'm running into could be common.

Also, regarding DASH streams, isn't the sample offset already being taken into account? In the repro example I provided, I also included a static DASH manifest that can be used to play these segments, and the TTML subtitles do indeed display. But - I also notice that the start time as displayed by the progress bar is not 0, it's a large value that (I assume) correlates with the initial media timestamp. It seems like DASH and HLS are dealing with media timestamps differently.

Thanks again. Please let me know if I can assist in any way.

RicFlinn commented 1 year ago

Checking in on this issue. Any progress?

You mention a local hack you made to get subtitles to display, is this something I could test to see if it works in my live streaming cases?

icbaker commented 1 year ago

The hacky change I made looks a bit like this in FragmentedMp4Extractor - specifically inserting code after this line:

https://github.com/androidx/media/blob/282171cb6f469c85d1b5eee1e07f571b5f3021da/libraries/extractor/src/main/java/androidx/media3/extractor/mp4/FragmentedMp4Extractor.java#L1414

private boolean updatedTextSubsampleOffset = false;
private boolean readSample(ExtractorInput input) throws IOException {
  ...
  if (timestampAdjuster != null) {
    if (!updatedTextSubsampleOffset) {
      output.format(
          track
              .format
              .buildUpon()
              .setSubsampleOffsetUs(timestampAdjuster.getTimestampOffsetUs())
              .build());
      updatedTextSubsampleOffset = true;
    }
    sampleTimeUs = timestampAdjuster.adjustSampleTimestamp(sampleTimeUs);
  }
  ...
}

This is not a good fix, because it may change the format of the track after the media period has been fully prepared, which is not permitted and may result in unexpected behaviour.

A better fix would ensure that this offset adjustment is made before the media period is fully prepared - but I haven't investigated exactly how to wire that up.

RicFlinn commented 1 year ago

Thanks for sharing the hack; FWIW I did include it in my build and tried it out on several of the HLS (converted from DASH) live streams we need to support, and it appears to work as desired: the subtitles displayed and appear as intended. Limited test cases, but it didn't appear to break anything at least.

I appreciate your efforts on this, let me know if I can be of any use.

RicFlinn commented 10 months ago

I've continued testing the hack on various streams (DASH converted to HLS, typically), and though it does seem to work in the majority of cases, we've found that certain DRM streams don't play with this hack in place. I haven't looked into the exact scenario that causes it to fail, but video freezes on a frame while audio continues to play.

Probably not a big surprise, since it's definitely not a proposed fix, but I thought I should share my findings.

icbaker commented 10 months ago

I wonder if you still see issues with video freezing if you finesse the hack slightly to only trigger on text tracks. Something like:

if (!updatedTextSubsampleOffset && track.type == C.TRACK_TYPE_TEXT) {
  // Same code as suggested above
}

I don't think that should make a major difference, but it also seems more correct to only do this for text tracks, so it seems worth a quick try.

RicFlinn commented 10 months ago

Thanks Ian, I gave this a quick try and yes, it does seem to fix the issue I was seeing. I haven't noticed any other issues but have a bit more testing I can do.