Closed micahrollinsmlb closed 1 year ago
Hello @micahrollinsmlb,
Thank you for reporting this issue and providing detailed instructions to replicate it. I can verify that this occurs when using the specified media source.
@tianyif, could you kindly investigate this further? I could not find a similar issue with the media items existing in our demo app.
Hello again, @tianyif! I was curious if you had any update on this issue?
Hi @micahrollinsmlb,
Sorry for the late update! It really took us long to figure out this issue. But don't worry, we like this interesting issue and we are happy to share the update!
Firstly, we have a class TimestampAdjuster
, which adjusts the sample timestamp according to the timestampOffsetUs
while the media is being extracted. That is to say, we don't assume the media always start from time 0, but with a specific offset, and we use TimestampAdjuster
to map the sample timestamp to another timestamp so that it can be respected to that specific offset. This timestampOffsetUs
is supposed to be calculated when the timeUs
of the first sample in the segment where the seek position located is passed to the method adjustSampleTimestamp(long)
.
For example, if you are seeking to the position 15:10, then the video segment index where the seek position 15:10 located is 228. While debugging we know the start time (in microseconds) of the segment 228 is 908908000
, however, the first timeUs
that comes with adjustSampleTimestamp(long)
is 911143266
, which makes the timestampOffsetUs
in the TimestampAdjuster
to be -2235266
. And then the following timeUs
s that came with adjustSampleTimestamp(long)
are 908974733
, 909108200
, 909041466
, ...
While we don't expect the timeUs
s passed into adjustSampleTimestamp(long)
have an increasing order (because the frames can be predicted and bidirectional), the first timeUs
of 911143266
looks very suspicious. Then we found that this 911143266
doesn't come from the sample, but from the emsg
, which provides signalling for generic events related to the media presentation time.
That is to say, we set the timestampOffsetUs
wrongly with the timestamp from emsg, in which the timestamp has ~2s difference from the desired timestamp 908908000
. Thus the consequence is that, for a real sample that has a timestamp 908974733
(at ~15:09), we will adjust it to 906739467
(at ~15:07). And in turn, for a real sample at ~15:10, we will adjust it to ~15:08, then player won't play this sample since it think this sample is still too early. And when the real sample is at ~15:12, since it has an adjusted time at ~15:10, the player starts to play it.
This is a bug in our HLS code to take the timestamp from emsg to calculate timestampOffsetUs
. To mitigate this, we have to turn off the flag FLAG_ENABLE_EMSG_TRACK. Unfortunately in our code, there is no a public entry to set this flag. There is a tricky solution:
ExoPlayer.Builder playerBuilder =
new ExoPlayer.Builder(/* context= */ this)
.setMediaSourceFactory(new HlsMediaSource.Factory(new DefaultDataSource.Factory(this)).setExtractorFactory(new HlsExtractorFactory() {
private DefaultHlsExtractorFactory defaultHlsExtractorFactory = new DefaultHlsExtractorFactory();
@Override
public HlsMediaChunkExtractor createExtractor(Uri uri, Format format,
@Nullable List<Format> muxedCaptionFormats,
TimestampAdjuster timestampAdjuster,
Map<String, List<String>> responseHeaders,
ExtractorInput sniffingExtractorInput, PlayerId playerId)
throws IOException {
return defaultHlsExtractorFactory.createExtractor(uri, format.buildUpon().setMetadata(null).build(), muxedCaptionFormats, timestampAdjuster, responseHeaders, sniffingExtractorInput, playerId);
}
}));
Notice that in the return line, we do format.buildUpon().setMetadata(null)
, then isFmp4Variant(Format)
will return false
in the absence of metadata, and then turn off the flag FLAG_ENABLE_EMSG_TRACK
.
I will leave this issue open and mark this issue as a bug since we should fix it. Thanks so much for reporting this issue!
And forgot to mention in the last reply. The timestampOffsetUs
can be initialized with either audio segment or video segment (whatever comes first), and once it’s set, it is shared by both video and audio segments later. That’s why the audio has inaccuracy after seeking as well.
That is really great to hear! Thank you for the hard work in tracking that down!
Media3 Version
ExoPlayer 2.18.6
Devices that reproduce the issue
Oculus Quest 2 Galaxy Note 20
Devices that do not reproduce the issue
No response
Reproducible in the demo app?
Yes
Reproduction steps
Expected result
The content of the audio/video is consistent for any point in the timeline
Actual result
It's not 100% repro, but with some decent consistency when timed right, the content of the audio/video will be off from the current position as reported by Player.getCurrentPosition. In my screen capture, for example, after I scrubbed backward, you can hear the announcer say "...is very comfortable in this venue" when the timeline says 15:10. However if you watch the source media (camA.m3u8), the announcer doesn't say that until 15:12, almost 15:13. Futhermore, prior to the backward scrub, you can hear the crack of the bat at 15:37, but after the backward scrub, the crack of the bat occurs at 15:35. This inconsistency is not limited to the audio, it's just harder to demonstrate the video inconsistency on a phone screen in the demo app. In VR, the video inconsistency is very noticeable.
The first three steps in the Repro Steps are optional, the bug will manifest without them. However, reproducing this bug is much easier with those changes.
Once the audio/video gets out of sync with the timeline, it will stay out of sync until another scrub is performed. At which point, it usually goes back in sync with the correct timeline.
Media
The source media is: https://vr-assets.mlb.com/AWS/DEV/6da78dc5-2142-4bd6-adc9-b654d4750635/camA.m3u8
I made a screen recording of this issue on a Galaxy Note 20: https://drive.google.com/file/d/1V5ouTLwnIAmEX_F7X9A6LTOt_Ip86ycx/view?usp=sharing
Bug Report
adb bugreport
to dev.exoplayer@gmail.com after filing this issue.