Closed DeweyReed closed 1 month ago
Hi @DeweyReed , Let me share some potential solutions you might want to look into:
For the frame index and how to sync it correctly with the effect, @claincly could you advice here?
Hello @DeweyReed,
Unfortunately counting frames in effects would not be reliable - frames could be dropped (by the decoder) before they arrive in the effect pipeline.
In general presentationTimestampUs
should be used as a reliable way to identify frames (given you don't change them in your effect, or use speed adjustment).
I wonder
Transformer
and then use the data to support previewing the video using either CompositionPlayer
or ExoPlayer
?presentationTimestampUs
?I tried to use
presentationTimeUs
, but the value changes even if the video is paused.
This is not unexpected - when paused we just stop consuming frames from the frame processor, but the frame processor might still be processing frames. The values shouldn't change too many times though.
Thank you for the fast replies!
I've checked some related issues on getting and processing every frame. Unfortunately, my procession is slower than the rendering, so using the frame cache effect still blocks playing too much.
What's your set up? Do you process the video with Transformer and then use the data to support previewing the video using either CompositionPlayer or ExoPlayer?
Yes.
For simplicity, my example will be applying face recognition and locating faces on every video frame in a preview.
setVideoEffects
with another effect that utilizes 30 data from the previous step.drawFrame
of the effect, I can reuse the data without waiting for recognition. However, I'm struggling to know which data out of 30 I should use.Do you use many videos in succession?
No. Only one video.
What specifically do you see with presentationTimestampUs?
Initially, I counted frames, but frames can get dropped, as you described. Then, I used presentationTimeUs
to calculate indices, but I realized that the value increases when I turn the screen on and off when the video is paused. From my understanding of the API, the value changes whenever rendering happens, even if the frame doesn't change.
Thanks for your detailed description.
Could you post the specific timestamps you see lost / change when turning on/off the screen? My understanding is that, the timestamp of a single frame wouldn't change and it's safe to match your data with presentation time in effects.
Another factor is what do you do when the screen is off. For example in the ExoPlayer demo app, the player is released
Here is the simple code that can reproduce the situation. The sample doesn't handle any lifecycle event.
val player = ExoPlayer.Builder(this).build()
player.setMediaItem(MediaItem.fromUri(getResourceUri(R.raw.video)))
player.setVideoEffects(listOf(GlEffect { _, useHdr -> EmptyProgram(useHdr) }))
player.prepare()
binding.playerView.player = player
The empty program copies the frame and logs the timestamp.
private class EmptyProgram(
useHdr: Boolean,
) : BaseGlShaderProgram(useHdr, 1) {
private val glProgram: GlProgram
init {
try {
glProgram = GlProgram(VERTEX_SHADER, FRAGMENT_SHADER)
} catch (e: IOException) {
throw VideoFrameProcessingException(e)
} catch (e: GlUtil.GlException) {
throw VideoFrameProcessingException(e)
}
glProgram.setBufferAttribute(
"aFramePosition",
GlUtil.getNormalizedCoordinateBounds(),
GlUtil.HOMOGENEOUS_COORDINATE_VECTOR_SIZE
)
val identityMatrix = GlUtil.create4x4IdentityMatrix()
glProgram.setFloatsUniform("uTransformationMatrix", identityMatrix)
glProgram.setFloatsUniform("uTexTransformationMatrix", identityMatrix)
}
override fun configure(inputWidth: Int, inputHeight: Int): Size {
return Size(inputWidth, inputHeight)
}
override fun drawFrame(inputTexId: Int, presentationTimeUs: Long) {
Log.d("EmptyProgram", "$presentationTimeUs")
try {
glProgram.use()
glProgram.setSamplerTexIdUniform("uTexSampler", inputTexId, 0)
glProgram.bindAttributesAndUniforms()
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4)
} catch (e: GlUtil.GlException) {
throw VideoFrameProcessingException(e, presentationTimeUs)
}
}
override fun release() {
super.release()
try {
glProgram.delete()
} catch (e: GlUtil.GlException) {
throw VideoFrameProcessingException(e)
}
}
companion object {
private const val VERTEX_SHADER = """attribute vec4 aFramePosition;
uniform mat4 uTransformationMatrix;
uniform mat4 uTexTransformationMatrix;
varying vec2 vTexSamplingCoord;
void main() {
gl_Position = uTransformationMatrix * aFramePosition;
vec4 texturePosition = vec4(aFramePosition.x * 0.5 + 0.5,
aFramePosition.y * 0.5 + 0.5, 0.0, 1.0);
vTexSamplingCoord = (uTexTransformationMatrix * texturePosition).xy;
}"""
private const val FRAGMENT_SHADER = """precision highp float;
uniform sampler2D uTexSampler;
varying vec2 vTexSamplingCoord;
void main() {
gl_FragColor = texture2D(uTexSampler, vTexSamplingCoord);
}"""
}
}
The player doesn't play automatically, so the video pauses at the start. However, the timestamps will increase if I turn the screen on and off several times.
1000000000000
1000000040000
1000000080000 // OFF
1000000120000 // ON
1000000160000 // OFF
1000000200000 // ON
The player doesn't play automatically
That's intended - you could use setPlayWhenReady(true)
or just call play()
, just FYI.
However, the timestamps will increase if I turn the screen on and off several times.
This is expected - everytime you turn on the screen we render a frame to the screen, and as you can see from the constant 40_000 us increment between timestamps.
I wonder where you log the timestamp? The timestamps should not have the 1000000000000 offset now (the offset was removed in an earlier release). I.e. you should see 0, 40000, 80000
, etc if you are on the up-to-date release.
Thanks for your patience.
You're right. I didn't start the video because I wanted to demonstrate that the timestamp of a video frame can change when the video is paused. Since the timestamp can differ for the same frame, I'd like to know how to get the video playback position to reuse the generated analysis data in an effect.
I log the timestamp in the drawFrame
method of the empty program.
override fun drawFrame(inputTexId: Int, presentationTimeUs: Long) {
Log.d("EmptyProgram", "$presentationTimeUs")
try {
I'm using the recent 1.4.0. I can get the big number with physical devices and emulators.
Since the timestamp can differ for the same frame
Right - In that case they are different frames, if the timestamp differ. I suspect the frames looked very similar, but they are distinct.
I'm using the recent 1.4.0. I can get the big number with physical devices and emulators.
Hmm could you try using the main
branch and see? I'm not sure if 1.4.0 included the fix to timestamps, but if it's on main
, it's a bug on us
After using the main
branch, the numbers start from zero, as expected.
I suppose the answer to my original question is no. I can't get a reliable video position when rendering a frame in an effect. If so, could you suggest other approaches to preview effects with a long processing time?
I suppose the answer to my original question is no. I can't get a reliable video position when rendering a frame in an effect. If so, could you suggest other approaches to preview effects with a long processing time?
Are you sure the timestamps aren't reliable? After recent fixes, I think you should get the exact same presentation times passed to effects in analysis mode (and Transformer in general) as the ones you get when you play using setVideoEffects
, assuming no frames are dropped during the previewing step. If frames are dropped during preview (due to playback not being able to keep up), timestamps and corresponding frames will still be passed in together but you will get a subset of them.
However, the timestamps will increase if I turn the screen on and off several times.
This is probably because the surface gets recreated every time, and we don't have a way to render the last decoder output frame. See this old bug for a bit more info: https://github.com/google/ExoPlayer/issues/6688. Is this behavior problematic for your use case?
It turned out that after changing the surface type of PlayerView
from surface_view
to texture_view
, the time is stable across locking and unlocking the screen. Now, I can use the analyzed data to preview the effect.
Thanks for your patient explanation! Have a great day! :D
Thanks for your excellent work on media3! The project opens up so many possibilities for Android development.
I'm applying some time-consuming processes to a video, like face mesh recognition. Existing solutions are too complicated or slow, so I decided to generate all the data in advance and reuse them. To reuse the data, I need to know which frame an effect draws to.
I tried to use
presentationTimeUs
, but the value changes even if the video is paused. I also tried to accessPlayer.currentPosition
, but it appears to be a delay between the current state of a player and the ongoing effect.In conclusion, I have some data for each frame but don't know how to get the frame index in the
drawFrame
method of a custom effect.