Make audio processors MediaItem aware

Tolriq commented 1 year ago

My app support gapless and everything works fine.

But my app also support Replay Gain that it applies from extracted metadata. The gain is applied from onTracksChanged and this works nicely for the first song (And also when applying for albums since the replay gain is constant).

The problem arise when users want to play random songs with random replay gain values. In that case due to pre buffering for gapless, onTracksChanged is called too late and the replay gain is applied late (For medium quality mp3 up to 2 seconds). This obviously is not nice.

Since they play random tracks there's no need for gapless, and they prefer a small delay for the next song than a wrong volume for a few seconds.

The question is how can I achieve that (disable gapless, so that I can apply the replaygain at the proper moment) in normal usage of exoplayer and media sources and everything.

I'd like to avoid rewrite a complete different path to not use ExoPlayer playlists and handle everything manually.

tonihei commented 1 year ago

Could you point out how exactly you apply the replay gain once you have the metadata? Is this a simple call to setVolume() or something more advanced than that? And also, where exactly do you get the metadata from - is it Format.metadata?

The reason for these questions is that waiting for onTracksChanged may work in practise, but it is strictly speaking also too late because you likely miss a few audio samples at the beginning of the stream. I wonder if whatever needs to be done should be implemented as an AudioProcessor or as an adaptation/wrapper of an existing audio processor. They unfortunately don't see the full Format object, but just a simplified view of it, but we may be able to make some adaptations to set the full Format (including the needed metadata). The advantage is not only that the first start should be better, but also all transitions (gapless/prebuffered or not) can apply the right effect very accurately to each audio sample.

Tolriq commented 1 year ago

Yes I get the RG values from the Format.metadata. (But I could also get them before if necessary).

And I use most of the time an AudioProcessor for that but there's no calls to configure or anything on media change if they are the same codec and everything. (And it's also an issue for example to enable / disable audio processor on demand, currently we can only bypass by doing a simple copy). The other way I use sometimes is to apply an Android DynamicsProcessing at higher level when I also apply some EQ stuff) but this is even less precise.

If there was a way with AudioProcessor then I would always use it but this is not the case currently.

For the first song or when pressing next (so a new prepare) this works perfectly the metadata is extracted before the codec is initialized and applied correctly before first audio sample or at least the flush works soon enough to ensure no wrong sample is played.

But with gapless we are not talking about 1 audio sample but could be up to 2 seconds that is already passed through the processor and the flush is wayy too late to recover. With flac or high res media this is a lot less but with mid quality MP3 this is 2 seconds.

Anyway if there's no way to disable the current gapless behavior, an audio processor solution would be nice. For this need and some others (like enabling / disabling processor) having a way at the processor level to know when there's a media change would help a lot. (Including a way to disable the processor based on the media for a small performance gain and avoid an useless buffer copy).

Tolriq commented 11 months ago

Bump ? Currently applying the RG via ffmpeg as a secondary solution but this is not efficient at all. A future native solution would be better.

Samrobbo commented 11 months ago

AudioProcessor is the ideal place to handle this replay gain.

It sounds to me like the problem is around the fact that AudioProcessor instances do not get the configure call when the media item changes. This is an intentional design because avoiding configuring/flushing the DefaultAudioSink means there are no audio discontinuities.

@tonihei - You're more familiar with ExoPlayer components. Is there functionality in audio renderers to force the sink to be configured when the media item changes? I can't determine what the trigger is for MediaCodecAudioRenderer/DecoderAudioRenderer to configure the underlying sink is - It all seems based on codec reuse.

Once/if @tonihei confirms this, my recommendation would be for @Tolriq to pass in a custom ForwardingAudioSink (of DefaultAudioSink), and intercept the AudioSink#configure call. Use the Format passed in to check for replay values in the metadata, and pass that information to a custom AudioProcessor. DefaultAudioSink#configure will then reconfigure and flush all AudioProcessor instances, which will make active the new replay value.

tonihei commented 11 months ago

AudioProcessor instances do not get the configure call when the media item changes

AudioSink.configure() is called for every stream transition already I believe. DefaultAudioSink then creates and configures new AudioProcessors for the new configuration and configures them for the new Format.

This means @Samrobbo's suggestion of wrapping AudioSink, forwarding all calls to a DefaultAudioSink and in addition set configuration in your audio processor instance from within AudioSink.configure should work I think.

Tolriq commented 11 months ago

Can try that after holidays but not sure about the flush on configure. Currently this does not happen when the format is the same, that's why we can't just disable the custom audio processor when not needed. But if the sink configure is called on same format I guess I can force it anyway.

Same question about Gapless, should I take special steps when the RG is constant and gapless wanted to avoid wrong flush and audible glitches?

Tolriq commented 11 months ago

Ok so @tonihei @Samrobbo I did a quick test.

configure on the AudioSink is properly called on same format between tracks and is called at the proper time so it could have worked.

But the format passed to that function have no metadata at all, no id, nothing I can use to try to extract or get access to already extracted RG data. Maybe the metadata is just not copied when calling that and it's an easy fix on ExoPlayer side? Is there any diagram of how all those classes are called during playback so I could look?

Tolriq commented 11 months ago

Actually got my answer:

https://github.com/androidx/media/blob/main/libraries/exoplayer/src/main/java/androidx/media3/exoplayer/audio/MediaCodecAudioRenderer.java#L560

And

https://github.com/androidx/media/blob/main/libraries/exoplayer/src/main/java/androidx/media3/exoplayer/audio/DecoderAudioRenderer.java#L446

The metadata is not passed down.

Adding:

              .setMetadata(inputFormat.metadata)

To DecoderAudioRenderer and

              .setMetadata(format.metadata)

To MediaCodecAudioRenderer

Seems to fix all the cases including when using external libraries like ffmpeg.

Not sure about the potential impacts / side effects before submitting a PR.

Samrobbo commented 11 months ago

@tonihei Can you advise?

Tolriq commented 11 months ago

So I'm late with my luggage but configuring the AP at that moment works perfectly including Gapless.

Would be really nice if there was a proper way to ensure that the metadata is passed down to the AudioSink.configure.

Tolriq commented 10 months ago

Bump.

At least to have confirmation that by doing so in my fork I won't have side effects I did not see yet.

tonihei commented 10 months ago

Ah, sorry, I didn't see the problem you highlighted when reading your post above.

Adding more identifying fields to this Format should work I believe. In fact, you can probably also copy id, label, language, selectionFlags and roleFlags which have similar identifying and describing roles that shouldn't influence the technical handling of the audio, but may be useful to know.

Tolriq commented 10 months ago

Ok perfect thanks.

Should I make a PR as it might be helpful globally for others no?

tonihei commented 10 months ago

Yes, please file a PR and we'll merge it in.

Tolriq commented 10 months ago

@tonihei done : https://github.com/androidx/media/pull/594

BTW unrelated but there's a tons of nullability issues in 1.2 (Never really checked that part of the code before). Like https://github.com/androidx/media/blob/main/libraries/exoplayer/src/main/java/androidx/media3/exoplayer/audio/MediaCodecAudioRenderer.java#L536 The mediaFormat is explicitly Nullable but is then blindly accessed. I guess it's just the annotation the issue but this makes tons of yellow lines everywhere :)

tonihei commented 10 months ago

BTW unrelated but there's a tons of nullability issues

We run Checker Framework's nullability checks internally to ensure there aren't any issues. However, there are some files that are currently still excluded from the check because we never got around to ensure proper annotations everywhere. MediaCodecAudioRenderer is one of these files and that's probably the reason you are seeing all these warnings. The Android Studio warnings are also not always useful because the checks run by this lint checker in Android Studio are not the same as applied by the static code analysis tool.

Tolriq commented 8 months ago

Seems ExoPlayer does not extract metadata from WAV (probably normal since there's no official ways), but even if I do have the RG values I can't find a way to pass them down to the AudioSink.

Is there anything I'm missing to be able to pass down at least an ID or something to gather the proper data from the proper related media?

Tolriq commented 7 months ago

@tonihei @Samrobbo Bump on this one, I got some other case where I now get the replaygain data outside of the tags and can't figure out a way to pass anything down to the AudioProcessor to actually get the data.

tonihei commented 7 months ago

@Samrobbo @christosts I believe there are some efforts to make the audio processing pipeline "MediaItem-aware" in order change effects based on the current item. We already pass down the current item to the audio renderer, but I'm not sure how/if this ends up in the actual audio processors. Could you comment on what's already planned (or what could be done?)

christosts commented 6 months ago

Could you comment on what's already planned (or what could be done?)

No concrete plans yet, the work is TBD. I think our first approach would be to use awareness of MediaItem in the audio renderer and then have custom logic that sits on top of the audio processors inside a custom audio sink implementation. So, unless we hit a dead-end, I'd say we will not change the AudioProcessors, at least not yet

Tolriq commented 6 months ago

@christosts @tonihei you both seem to say audio renderer is MediaItem aware but I can't see anything mediaItem related or a way to access it from MediaCodecAudioRenderer and DecoderAudioRenderer.

If there's a way to access the MediaItem there I can use it to augment the metadata passed down to the sink and that works for me to have a fork for that as a constrained change.

christosts commented 6 months ago

you both seem to say audio renderer is MediaItem aware but I can't see anything mediaItem related or a way to access it

Apologies, this is just added in Media3 v1.2.0. I'm copying the answer from another comment I posted recently:

As of version 1.2.0, there are two changes in Renderer:

We added Renderer.setTimeline(). ExoPlayer updates the timeline on the renderers every time there is a timeline change, so the timeline is always updated.
If you're re-using existing renderers classes, or you implemented your own renderers but still extend BaseRenderer, then the timeline instance is available via the protected method BaseRenderer.getTimeline().
We added a MediaPeriodId argument in Renderer.replaceStream().
If you're re-using existing renderers classes, or you implemented your own renderers but still extend BaseRenderer, then the MediaPeriodId is available in BaseRenderer.onStreamChanged().

With a Timeline and a MediaPeriodId, you can get the MediaItem currently played, eg with

Timeline.Period period =  getTimeline().getPeriodByUid(mediaPeriodId, new Timeline.Period());    
MediaItem mediaItem = getTimeline().getWindow(period.windowIndex, new Timeline.Window()).mediaItem;

Tolriq commented 6 months ago

@christosts Thanks a lot I'll make some tests, Is the replaceStream / onStreamChanged called before onOutputFormatChanged or drainOutputBuffer() for ffmpeg decoder?

Trying to figure out the proper place to handle things for both renderers to pass to the sink without a flush is possible.

christosts commented 6 months ago

replaceStream / onStreamChanged are called as soon as the renderer is told it's reading a new stream of samples, which is before onOutputFormatChanged (I'm assuming that each audio content has 1 format, eg there's no audio format switching with DASH/HLS formats).

That said, when replaceStream / onStreamChanged is called, the decoder may have samples already queued from the previous media item. The ffmpeg extension can have up to 16 queued so you need to tune when the new changes apply. I don't see an onOutputFormatChanged for the ffmpeg extension, you mentioned earlier you were able to apply the new configuration as part of AudioSink.configure(), is that right?

Tolriq commented 6 months ago

Yes with https://github.com/androidx/media/pull/594 the metadata is now available at the configure call in the processor.

I just need a way to augment the metadata from the media item for the case ExoPlayer does not extract what I need or I get it from the server.

Maybe there's another place where I can call again the sink configure without triggering sample loss.

The whole attempt is to avoid wrong volume sample that can be hard on ears with large replay gain values.

Tolriq commented 2 months ago

So finally started to investigate on this and it seems to work (needs to use currentMediaPeriodId.periodUid and not directly the mediaPeriodId)

I can access the proper media item before the sink.configure call and augment the metadata.

But since this is quite intrusive in a fork, do you think there's a way to negotiate an official solution @christosts ?

Like simply adding a mediaId to the format and passing it down, this looks like the less intrusive way to pass the data without an API change.

I can imagine custom audioprocessors having access to the media item can solve a few issues like this one (replay gain), but could also allow per media items settings, like convert to mono or specific balance, ....

tonihei commented 2 months ago

I'll reassign it to @Samrobbo who may have the most insights in the audio processing plans. The general request sounds reasonable and I think there already were other plans to make the audio processors MediaItem-aware.

androidx / media

Make audio processors MediaItem aware #418