androidx / media

Jetpack Media3 support libraries for media use cases, including ExoPlayer, an extensible media player for Android
https://developer.android.com/media/media3
Apache License 2.0
1.59k stars 377 forks source link

[Effect] How to get the video position of a rendering frame in drawFrame? #1600

Closed DeweyReed closed 1 month ago

DeweyReed commented 1 month ago

Thanks for your excellent work on media3! The project opens up so many possibilities for Android development.

I'm applying some time-consuming processes to a video, like face mesh recognition. Existing solutions are too complicated or slow, so I decided to generate all the data in advance and reuse them. To reuse the data, I need to know which frame an effect draws to.

I tried to use presentationTimeUs, but the value changes even if the video is paused. I also tried to access Player.currentPosition, but it appears to be a delay between the current state of a player and the ongoing effect.

In conclusion, I have some data for each frame but don't know how to get the frame index in the drawFrame method of a custom effect.

droid-girl commented 1 month ago

Hi @DeweyReed , Let me share some potential solutions you might want to look into:

  1. Check out issue #1551 for details on how to decode frames in advance
  2. You might want to consider run Transformer in 2 rounds. The first one by enabling ANALYZER_MODE (reference to demo app here) . Then you can use your analysis data in the second round to apply effects.

For the frame index and how to sync it correctly with the effect, @claincly could you advice here?

claincly commented 1 month ago

Hello @DeweyReed,

Unfortunately counting frames in effects would not be reliable - frames could be dropped (by the decoder) before they arrive in the effect pipeline.

In general presentationTimestampUs should be used as a reliable way to identify frames (given you don't change them in your effect, or use speed adjustment).

I wonder

I tried to use presentationTimeUs, but the value changes even if the video is paused.

This is not unexpected - when paused we just stop consuming frames from the frame processor, but the frame processor might still be processing frames. The values shouldn't change too many times though.

DeweyReed commented 1 month ago

Thank you for the fast replies!

I've checked some related issues on getting and processing every frame. Unfortunately, my procession is slower than the rendering, so using the frame cache effect still blocks playing too much.

What's your set up? Do you process the video with Transformer and then use the data to support previewing the video using either CompositionPlayer or ExoPlayer?

Yes.

For simplicity, my example will be applying face recognition and locating faces on every video frame in a preview.

  1. I decided to run two rounds. First, I run a transformer using an effect that processes each frame, generates necessary data, and stores the data somewhere. Let's say we get 30 data for a 30-frame and one-second video.
    • The new analyzer mode can make sure that all data is collected.
  2. When previewing a video, I call setVideoEffects with another effect that utilizes 30 data from the previous step.
  3. Inside the drawFrame of the effect, I can reuse the data without waiting for recognition. However, I'm struggling to know which data out of 30 I should use.

Do you use many videos in succession?

No. Only one video.

What specifically do you see with presentationTimestampUs?

Initially, I counted frames, but frames can get dropped, as you described. Then, I used presentationTimeUs to calculate indices, but I realized that the value increases when I turn the screen on and off when the video is paused. From my understanding of the API, the value changes whenever rendering happens, even if the frame doesn't change.

claincly commented 1 month ago

Thanks for your detailed description.

Could you post the specific timestamps you see lost / change when turning on/off the screen? My understanding is that, the timestamp of a single frame wouldn't change and it's safe to match your data with presentation time in effects.

Another factor is what do you do when the screen is off. For example in the ExoPlayer demo app, the player is released

https://github.com/androidx/media/blob/b01c6ffcb3fca3d038476dab5d3bc9c9f2010781/demos/main/src/main/java/androidx/media3/demo/main/PlayerActivity.java#L182

DeweyReed commented 1 month ago

Here is the simple code that can reproduce the situation. The sample doesn't handle any lifecycle event.

val player = ExoPlayer.Builder(this).build()
player.setMediaItem(MediaItem.fromUri(getResourceUri(R.raw.video)))
player.setVideoEffects(listOf(GlEffect { _, useHdr -> EmptyProgram(useHdr) }))
player.prepare()
binding.playerView.player = player

The empty program copies the frame and logs the timestamp.

private class EmptyProgram(
    useHdr: Boolean,
) : BaseGlShaderProgram(useHdr, 1) {
    private val glProgram: GlProgram

    init {
        try {
            glProgram = GlProgram(VERTEX_SHADER, FRAGMENT_SHADER)
        } catch (e: IOException) {
            throw VideoFrameProcessingException(e)
        } catch (e: GlUtil.GlException) {
            throw VideoFrameProcessingException(e)
        }

        glProgram.setBufferAttribute(
            "aFramePosition",
            GlUtil.getNormalizedCoordinateBounds(),
            GlUtil.HOMOGENEOUS_COORDINATE_VECTOR_SIZE
        )

        val identityMatrix = GlUtil.create4x4IdentityMatrix()
        glProgram.setFloatsUniform("uTransformationMatrix", identityMatrix)
        glProgram.setFloatsUniform("uTexTransformationMatrix", identityMatrix)
    }

    override fun configure(inputWidth: Int, inputHeight: Int): Size {
        return Size(inputWidth, inputHeight)
    }

    override fun drawFrame(inputTexId: Int, presentationTimeUs: Long) {
        Log.d("EmptyProgram", "$presentationTimeUs")
        try {
            glProgram.use()
            glProgram.setSamplerTexIdUniform("uTexSampler", inputTexId, 0)
            glProgram.bindAttributesAndUniforms()

            GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4)
        } catch (e: GlUtil.GlException) {
            throw VideoFrameProcessingException(e, presentationTimeUs)
        }
    }

    override fun release() {
        super.release()
        try {
            glProgram.delete()
        } catch (e: GlUtil.GlException) {
            throw VideoFrameProcessingException(e)
        }
    }

    companion object {
        private const val VERTEX_SHADER = """attribute vec4 aFramePosition;
uniform mat4 uTransformationMatrix;
uniform mat4 uTexTransformationMatrix;
varying vec2 vTexSamplingCoord;

void main() {
  gl_Position = uTransformationMatrix * aFramePosition;
  vec4 texturePosition = vec4(aFramePosition.x * 0.5 + 0.5,
                              aFramePosition.y * 0.5 + 0.5, 0.0, 1.0);
  vTexSamplingCoord = (uTexTransformationMatrix * texturePosition).xy;
}"""
        private const val FRAGMENT_SHADER = """precision highp float;
uniform sampler2D uTexSampler;
varying vec2 vTexSamplingCoord;

void main() {
   gl_FragColor = texture2D(uTexSampler, vTexSamplingCoord);
}"""
    }
}

The player doesn't play automatically, so the video pauses at the start. However, the timestamps will increase if I turn the screen on and off several times.

1000000000000
1000000040000
1000000080000 // OFF
1000000120000 // ON
1000000160000 // OFF
1000000200000 // ON
claincly commented 1 month ago

The player doesn't play automatically

That's intended - you could use setPlayWhenReady(true) or just call play(), just FYI.

However, the timestamps will increase if I turn the screen on and off several times.

This is expected - everytime you turn on the screen we render a frame to the screen, and as you can see from the constant 40_000 us increment between timestamps.

I wonder where you log the timestamp? The timestamps should not have the 1000000000000 offset now (the offset was removed in an earlier release). I.e. you should see 0, 40000, 80000, etc if you are on the up-to-date release.

DeweyReed commented 1 month ago

Thanks for your patience.

You're right. I didn't start the video because I wanted to demonstrate that the timestamp of a video frame can change when the video is paused. Since the timestamp can differ for the same frame, I'd like to know how to get the video playback position to reuse the generated analysis data in an effect.

I log the timestamp in the drawFrame method of the empty program.

    override fun drawFrame(inputTexId: Int, presentationTimeUs: Long) {
        Log.d("EmptyProgram", "$presentationTimeUs")
        try {

I'm using the recent 1.4.0. I can get the big number with physical devices and emulators.

claincly commented 1 month ago

Since the timestamp can differ for the same frame

Right - In that case they are different frames, if the timestamp differ. I suspect the frames looked very similar, but they are distinct.

I'm using the recent 1.4.0. I can get the big number with physical devices and emulators.

Hmm could you try using the main branch and see? I'm not sure if 1.4.0 included the fix to timestamps, but if it's on main, it's a bug on us

DeweyReed commented 1 month ago

After using the main branch, the numbers start from zero, as expected.

I suppose the answer to my original question is no. I can't get a reliable video position when rendering a frame in an effect. If so, could you suggest other approaches to preview effects with a long processing time?

andrewlewis commented 1 month ago

I suppose the answer to my original question is no. I can't get a reliable video position when rendering a frame in an effect. If so, could you suggest other approaches to preview effects with a long processing time?

Are you sure the timestamps aren't reliable? After recent fixes, I think you should get the exact same presentation times passed to effects in analysis mode (and Transformer in general) as the ones you get when you play using setVideoEffects, assuming no frames are dropped during the previewing step. If frames are dropped during preview (due to playback not being able to keep up), timestamps and corresponding frames will still be passed in together but you will get a subset of them.

However, the timestamps will increase if I turn the screen on and off several times.

This is probably because the surface gets recreated every time, and we don't have a way to render the last decoder output frame. See this old bug for a bit more info: https://github.com/google/ExoPlayer/issues/6688. Is this behavior problematic for your use case?

DeweyReed commented 1 month ago

It turned out that after changing the surface type of PlayerView from surface_view to texture_view, the time is stable across locking and unlocking the screen. Now, I can use the analyzed data to preview the effect.

Thanks for your patient explanation! Have a great day! :D