androidx / media

Jetpack Media3 support libraries for media use cases, including ExoPlayer, an extensible media player for Android
https://developer.android.com/media/media3
Apache License 2.0
1.68k stars 403 forks source link

Value returned by Player.getCurrentPosition incorrect after scrubbing in HLS video #356

Closed micahrollinsmlb closed 1 year ago

micahrollinsmlb commented 1 year ago

Media3 Version

ExoPlayer 2.18.6

Devices that reproduce the issue

Oculus Quest 2 Galaxy Note 20

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Yes

Reproduction steps

  1. [Optional] Update the demo app to skip backwards by 30 seconds instead of 5 seconds
  2. [Optional] Update the demo app to keep the video controls/playhead visible to make it easier to watch the playhead
  3. [Optional] Update the demo app to automatically seek to 920000ms into the video (15:20 in the timeline)
  4. Play the attached media (https://vr-assets.mlb.com/AWS/DEV/6da78dc5-2142-4bd6-adc9-b654d4750635/camA.m3u8) in the demo app
  5. If Repro Step 3 was omitted, skip to approximately 15:20
  6. Allow to play until 15:40, right when the announcer says: "It's a chance for him to spend a..."
  7. Skip backward 30 seconds, which should take the playhead to 15:10
  8. Allow to play normally back until 15:40 observing the content of the audio/video in relation to the playhead

Expected result

The content of the audio/video is consistent for any point in the timeline

Actual result

It's not 100% repro, but with some decent consistency when timed right, the content of the audio/video will be off from the current position as reported by Player.getCurrentPosition. In my screen capture, for example, after I scrubbed backward, you can hear the announcer say "...is very comfortable in this venue" when the timeline says 15:10. However if you watch the source media (camA.m3u8), the announcer doesn't say that until 15:12, almost 15:13. Futhermore, prior to the backward scrub, you can hear the crack of the bat at 15:37, but after the backward scrub, the crack of the bat occurs at 15:35. This inconsistency is not limited to the audio, it's just harder to demonstrate the video inconsistency on a phone screen in the demo app. In VR, the video inconsistency is very noticeable.

The first three steps in the Repro Steps are optional, the bug will manifest without them. However, reproducing this bug is much easier with those changes.

Once the audio/video gets out of sync with the timeline, it will stay out of sync until another scrub is performed. At which point, it usually goes back in sync with the correct timeline.

Media

The source media is: https://vr-assets.mlb.com/AWS/DEV/6da78dc5-2142-4bd6-adc9-b654d4750635/camA.m3u8

I made a screen recording of this issue on a Galaxy Note 20: https://drive.google.com/file/d/1V5ouTLwnIAmEX_F7X9A6LTOt_Ip86ycx/view?usp=sharing

Bug Report

rohitjoins commented 1 year ago

Hello @micahrollinsmlb,

Thank you for reporting this issue and providing detailed instructions to replicate it. I can verify that this occurs when using the specified media source.

@tianyif, could you kindly investigate this further? I could not find a similar issue with the media items existing in our demo app.

micahrollinsmlb commented 1 year ago

Hello again, @tianyif! I was curious if you had any update on this issue?

tianyif commented 1 year ago

Hi @micahrollinsmlb,

Sorry for the late update! It really took us long to figure out this issue. But don't worry, we like this interesting issue and we are happy to share the update!

Firstly, we have a class TimestampAdjuster, which adjusts the sample timestamp according to the timestampOffsetUs while the media is being extracted. That is to say, we don't assume the media always start from time 0, but with a specific offset, and we use TimestampAdjuster to map the sample timestamp to another timestamp so that it can be respected to that specific offset. This timestampOffsetUs is supposed to be calculated when the timeUs of the first sample in the segment where the seek position located is passed to the method adjustSampleTimestamp(long).

For example, if you are seeking to the position 15:10, then the video segment index where the seek position 15:10 located is 228. While debugging we know the start time (in microseconds) of the segment 228 is 908908000, however, the first timeUs that comes with adjustSampleTimestamp(long) is 911143266, which makes the timestampOffsetUs in the TimestampAdjuster to be -2235266. And then the following timeUss that came with adjustSampleTimestamp(long) are 908974733, 909108200, 909041466, ...

While we don't expect the timeUss passed into adjustSampleTimestamp(long) have an increasing order (because the frames can be predicted and bidirectional), the first timeUs of 911143266 looks very suspicious. Then we found that this 911143266 doesn't come from the sample, but from the emsg, which provides signalling for generic events related to the media presentation time.

That is to say, we set the timestampOffsetUs wrongly with the timestamp from emsg, in which the timestamp has ~2s difference from the desired timestamp 908908000. Thus the consequence is that, for a real sample that has a timestamp 908974733 (at ~15:09), we will adjust it to 906739467 (at ~15:07). And in turn, for a real sample at ~15:10, we will adjust it to ~15:08, then player won't play this sample since it think this sample is still too early. And when the real sample is at ~15:12, since it has an adjusted time at ~15:10, the player starts to play it.

This is a bug in our HLS code to take the timestamp from emsg to calculate timestampOffsetUs. To mitigate this, we have to turn off the flag FLAG_ENABLE_EMSG_TRACK. Unfortunately in our code, there is no a public entry to set this flag. There is a tricky solution:

ExoPlayer.Builder playerBuilder =
          new ExoPlayer.Builder(/* context= */ this)
              .setMediaSourceFactory(new HlsMediaSource.Factory(new DefaultDataSource.Factory(this)).setExtractorFactory(new HlsExtractorFactory() {
                private DefaultHlsExtractorFactory defaultHlsExtractorFactory = new DefaultHlsExtractorFactory();
                @Override
                public HlsMediaChunkExtractor createExtractor(Uri uri, Format format,
                    @Nullable List<Format> muxedCaptionFormats,
                    TimestampAdjuster timestampAdjuster,
                    Map<String, List<String>> responseHeaders,
                    ExtractorInput sniffingExtractorInput, PlayerId playerId)
                    throws IOException {
                  return defaultHlsExtractorFactory.createExtractor(uri, format.buildUpon().setMetadata(null).build(), muxedCaptionFormats, timestampAdjuster, responseHeaders, sniffingExtractorInput, playerId);
                }
              }));

Notice that in the return line, we do format.buildUpon().setMetadata(null), then isFmp4Variant(Format) will return false in the absence of metadata, and then turn off the flag FLAG_ENABLE_EMSG_TRACK.

I will leave this issue open and mark this issue as a bug since we should fix it. Thanks so much for reporting this issue!

tianyif commented 1 year ago

And forgot to mention in the last reply. The timestampOffsetUs can be initialized with either audio segment or video segment (whatever comes first), and once it’s set, it is shared by both video and audio segments later. That’s why the audio has inaccuracy after seeking as well.

micahrollinsmlb commented 1 year ago

That is really great to hear! Thank you for the hard work in tracking that down!