INDExOS / media-for-mobile

Media for Mobile
Other
456 stars 178 forks source link

CommandProcessor.process() never ends (race condition and EGL error: 0x3003) #69

Open sjvc opened 6 years ago

sjvc commented 6 years ago

I'm using GLCapture to record a video. But I noticed that sometimes (about 50% percent of the times) generated video is not readable.

Navigating through source code and debugging, I noticed that sometimes, commandProcessor.process() inside CapturePipeline.java never ends, because commandProcessor.stop() has not been called. If commandProcessor.process()never ends, then pipeline.release() is not called. Inside pipeline.release() is where sources and sinks are closed (where file writing is finished). So the result is that commandProcessor.process() is stuck in while (!stopped) doing nothing, and files are not readable.

So, where is commandProcessor.stop() called? In Pipeline.java, when the sink is set (setSink), a stop listener is assigned to it, and commandProcessor.stop() is called when sink is stopped. The sink is an instance of MuxRender, and I saw that onStopListener.onStop() is executed when MuxRender.drain method is executed exactly the same number of times that plugins are connected to it. In our case, these are 2 (for video and audio). To sum up, commandProcessor.stop() is not called because MuxRender.drain method is only called 1 time.

Who have to call MuxRender.drain method? It's PullDataCommandHandler.handle (command handlers are executed in CommandProcessor.process()). PullDataCommandHandler.handle calls output.getFrame(), and if result is Frame.EOF, then input.drain is called. The input is always MuxRender, and outputs are AudioEncoder and VideoEncoder. I can advance that the problem is with video, so I will focus on video.

When all works correctly, it's like that: To stop video capturing, we call GLCapture.stop(), who calls CapturePipeline.stop(), who calls Pipeline.stop(), who calls CaptureSource.stop() (mediaSource.stop()), who adds Command.EndOfFile to its commandQueue, and it's processed and it makes that DrainCommandHandler is executed, so VideoEncoder.drain() is executed. So, the next time that PullDataCommandHandler is executed to pass data from VideoEncoder to MuxRender, it will call getFrame() on VideoEncoder, who responds with Frame.EOF() (because it's drained), so such command handler will call MuxRender.drain(). And this is all we need: that MuxRender.drain() is called, so onStopListener is called, and commandProcessor.stop() is called, and pipeline.release() is a called, and all the work is stopped and files are finished correctly.

So, when is not working correctly? Let's see: To stop video capturing, we call GLCapture.stop(), who calls CapturePipeline.stop(), who calls Pipeline.stop(), who calls CaptureSource.stop() (mediaSource.stop()), who adds Command.EndOfFile to its commandQueue. But before it's processed, this time the queue will process PushSurfaceCommandHandler (to pass data from CaptureSource to VideoEncoder), who calls CaptureSource.getFrame(), who returns Frame.EOF() and clears the commandQueue. As commandQueue has been cleared, Command.EndOfFile never is processed, so VideoEncoder is not drained, and PullDataCommandHandler won't call MuxRender.drain(), so it's onStopListener won't be called, and commandProcessor will be processing forever... and video file is not readable.

How to solve it? @jhognon suggests removing commandQueue.clear() method call from CaptureSource.java in Issue #41 to solve this race condition. It worked, because then Command.EndOfFile is processed, and everything goes fine. Is it the correct solution? Well, we don't know. But it works.

I executed it several times, it seemed it was working, but it happened again: commandProcessor never ends, and file is unreadable. But this time the generated file is 0KB. Debugging the project again, I found that an error is generated after calling EGL14.eglCreateContext in InputSurface.java. So no video data is processed, and of course, MuxRender.drain method is never called. The generated error was EGL error: 0x3003 (BAD_ALLOC), which is generated if there are not enough resources to allocate the new context.

So, now the problem is that there are no enough resources to call EGL14.eglCreateContext. This lead to question: is EGL14.eglDestroyContext being called to free resources? I found it's called inside InputSurface.release(). InputSurface is instantiated in Surface.java. And... guess what? Surface never calls InputSurface.release(), so EGL14.eglDestroyContext is never called... I added a call to inputSurface.release() (and outputSurface.release()) inside Surface.release(), and now everything is working like a charm!

But, what if there is another error that makes commandProcessor() run forever? To ensure it won't happen again, I'm stoping pipeline's commandProcessor inside CapturePipeline.stop() method. With this call, I'll be sure that all the processing stops, and file is readable (so I reverted deleting commandQueue.clear() inside CaptureSource.java, because such modification is not needed).

I know it's not the correct solution, but a workaround until someone finds the correct one. I pushed these modifications to my github: https://github.com/sjvc/media-for-mobile

Is anyone maintaining this library? Please, take a look at this issue. Let me know if I can help.

gzgang commented 6 years ago

@sjvc Hi,do you know how to improve the output video quality when compress video? If I enlarged bitrate, it's not work as expect.

sjvc commented 6 years ago

I don't know, sorry @gzgang

anonym24 commented 6 years ago

@gzgang same problem it just sets maximum allowed bitrate: videoFormat.setVideoBitRateInKBytes(maxBitRate); but not recording bitrate.. did you find a solution?

doggan commented 6 years ago

This didn't work for me. It causes 0kb videos to be generated.

The other solution (commenting out commandQueue.clear()) works most of the time, but does occasionally time out causing the "Cannot stop capture thread" error to appear.

I've modified the m4m source to allow exporting videos larger than 1280 resolution. The timeout issues seem to happen more frequently for larger videos.

Thanks for the attempt though.