bytedeco / javacv

Java interface to OpenCV, FFmpeg, and more
Other
7.47k stars 1.57k forks source link

Inefficient Java2DFrameConverter #1172

Open b005t3r opened 5 years ago

b005t3r commented 5 years ago

I've been testing converting Frame instances to BufferedImage instances and noticed it's very slow, because, no matter what, every pixel channel is copied manually from Frame's buffer to BufferedImage (even when pixel layout and values don't have to be altered at all).

I created a small example with a streamlined approach, that can be used instead, if there's no gamma conversion required and Frame's pixel format matches (won't work for every pixel format, obviously, but works for BGR/RGB formats). You can comment/uncomment parts where frame <-> image conversion is done and test it on your machine.

My results are:

public class FrameGrabberTest {
    public static final int PIXEL_FORMAT = AV_PIX_FMT_BGR24;

    public static final String IN_PATH = "in.mp4";
    public static final String OUT_PATH = "out.mp4";

    private static Java2DFrameConverter converter = new Java2DFrameConverter();

    public static void main(String[] args) throws FrameGrabber.Exception, FrameRecorder.Exception {
        FFmpegFrameGrabber grabber = new FFmpegFrameGrabber(IN_PATH);
        grabber.setOption("hwaccel", "videotoolbox"); // does nothing in the current ffmpeg version
        grabber.setVideoOption("threads", "auto");
        grabber.setPixelFormat(PIXEL_FORMAT);
        grabber.start();

        FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(OUT_PATH, grabber.getImageWidth(), grabber.getImageHeight());
        recorder.setFormat(grabber.getFormat());
        recorder.setFrameRate(grabber.getFrameRate());
        recorder.setVideoBitrate(grabber.getVideoBitrate());
        recorder.setVideoCodecName("h264_videotoolbox");
        recorder.setVideoOption("threads", "auto");
        recorder.start();

        int frameCount = 0;
        long startTime = System.currentTimeMillis();

        Frame inFrame;
        while ((inFrame = grabber.grabFrame(false, true, true, false)) != null) {
            BufferedImage image = createBufferedImage(inFrame); // faster conversion
            //BufferedImage image = converter.convert(inFrame); // JavaCV conversion

            /* process image */

            Frame outFrame = createFrame(image); // faster conversion
            //Frame outFrame = converter.convert(image); // JavaCV conversion
            recorder.record(outFrame, PIXEL_FORMAT);

            ++frameCount;
        }

        float secs = (System.currentTimeMillis() - startTime) / 1000.0f;

        System.out.println("processing " + frameCount + " frames took " + (Math.round(secs * 1000) / 1000.0f) + " (" + (Math.round(frameCount / secs * 1000) / 1000.0f) + " fps)");

        recorder.stop();
        grabber.stop();
    }

    private static byte[] framePixels = null;

    public static BufferedImage createBufferedImage(Frame frame) {
        ByteBuffer buffer = (ByteBuffer) frame.image[0].position(0);

        if(framePixels == null)
            framePixels = new byte[buffer.limit()];

        buffer.get(framePixels);

        ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB);

        ColorModel cm = new ComponentColorModel(cs, false,false, Transparency.OPAQUE, DataBuffer.TYPE_BYTE);
        WritableRaster wr = Raster.createWritableRaster(new ComponentSampleModel(DataBuffer.TYPE_BYTE, frame.imageWidth, frame.imageHeight, frame.imageChannels, frame.imageStride, new int[] {2, 1, 0}), null);
        byte[] bufferPixels = ((DataBufferByte) wr.getDataBuffer()).getData();

        System.arraycopy(framePixels, 0, bufferPixels, 0, framePixels.length);

        return new BufferedImage(cm, wr, false, null);
    }

    private static byte[] imagePixels = null;

    public static Frame createFrame(BufferedImage image) {
        DataBufferByte imageBuffer = (DataBufferByte) image.getRaster().getDataBuffer();

        if(imagePixels == null)
            imagePixels = new byte[imageBuffer.getData().length];

        System.arraycopy(imageBuffer.getData(), 0, imagePixels, 0, imagePixels.length);

        Frame frame = new Frame(image.getWidth(), image.getHeight(), 8, 3);
        ByteBuffer frameBuffer = (ByteBuffer) frame.image[0].position(0);

        frameBuffer.put(imagePixels);

        return frame;
    }
}
saudet commented 5 years ago

It would be great if you could send a pull request for that.

b005t3r commented 5 years ago

Honestly, this project is huge and takes forever to build, so I'll probably won't be sending any pull requests, as it would take way too much time to share such a simple thing.

Also, after giving it a second thought, I don't think it's a good idea to incorporate this as a part of Java2DFrameConverter, because it's impossible to tell, from client's point of view, if a given format would be converted efficiently.

I can prepare something like FastBufferedImageFrameConverter which would handle only a few input formats, but would work efficiently for all of them. I can share the code here, I don't think this is going to be anything complex.

saudet commented 5 years ago

JavaCV itself doesn't even take 10 minutes to build, see on Travis CI, for example: https://travis-ci.org/bytedeco/javacv You're doing something wrong. How are you trying to build it?

There are cases in Java2DFrameConverter that can be reduced to simple buffer copies. We just need to put a couple of conditions in a if/else statement and that's it. We don't need to create any new API for that.

b005t3r commented 5 years ago

I don't know, I simply called cppbuild.sh for only ffmpeg.

There are cases in Java2DFrameConverter that can be reduced to simple buffer copies. We just need to put a couple of conditions in a if/else statement and that's it. We don't need to create any new API for that.

Not what I'm talking about. It's still going to be slow for some cases and you won't be able to tell (from client's point of view). I'd rather have something that works fast, but doesn't handle every format (so I can always fall back to using a different converter, which is slower, but I can do it knowing the cost) than have a black box, that works, but there's no way of telling if I'll get 60 FPS or 6 FPS of processing performance.

This part - knowing how fast it's gonna be - is actually crucial for me (I'm working on a real-time processing for television).

saudet commented 5 years ago

I don't know, I simply called cppbuild.sh for only ffmpeg.

No need to run that, use the snapshots: http://bytedeco.org/builds/

This part - knowing how fast it's gonna be - is actually crucial for me (I'm working on a real-time processing for television).

We could put the conditions in a public method to let users call it and see if they're on the "fast path", and/or we could log messages about what it's doing, and/or include a "failIfNotFastConvert()` method, and/or an optional boolean method parameter flag that does the same, etc. There are plenty of options without sacrificing user friendliness!

b005t3r commented 5 years ago

No need to run that, use the snapshots: http://bytedeco.org/builds/

We already discussed that and I've got no idea how to do this - it's incredibly convoluted, not exactly a simple "get the most recent version from github" :)

saudet commented 5 years ago

Let's see, where is that bug report at Gradle... Ah, here it is, closed as invalid: https://github.com/gradle/gradle/issues/2882 So, well, you'll need to learn Maven either way at some point. It doesn't look like the authors of Gradle care much about this use case. Maven works pretty well, you know, it's not that complicated!

b005t3r commented 5 years ago

Once there's a step by step tutorial I can follow, let me know :)

saudet commented 5 years ago

For JavaCV itself, it's easy, the snapshot repository is already in its pom.xml file. Just run git clone https://github.com/bytedeco/javacv, then cd javacv, then mvn install, and you're done!

saudet commented 5 years ago

And then we can also run mvn install in the platform subdirectory. That will download all you need to make it work with Gradle as well.

LoveraSantiago commented 5 years ago

Taking advantage of this topic on optimization of Java2DFrameConverter.

I would like to ask if there is any particular reason for Java2DFrameConverter to use the same instance of BufferedImage as return?

If not, I would suggest that this method always return a different instance of BufferedImage.

I came across the following scenario:

I'm converting a BufferedInputStream from a .h264 file to BufferedImage images that I put on a javaFX screen inside an ImageView component. Java2DFrameConverter conversions are much faster than displaying the images for ImageView (I'm using a 10fps video). Because of this I decided to use the native Flow APIs (Publisher, Subscriber). The result of the video conversion to BufferedImage made by the Java2DFrameConverter for each frame are submitted to a Publisher where the screen in JavaFx is the subscriber that consumes those images at a range conforming to a given fps. When the BufferedImages were consumed my ImageView is that I had problem because they always exhibited the same image. I thought it was some problem related to threads in the JavaFX world. Because each BufferedImage needed to be transformed into an Image object to be displayed in ImageView. Only after a while did I realize that the return of each converted frame pointed to the same instance of BufferedImage that had already been updated before my consumption.

As an alternative to solving my problem, with each frame transformed into BufferedImage I created a new instance of BufferedImage by copying the contents of the BufferedImage returned by the Java2DFrameConverter and put it in the publisher. And everything worked as expected by the JavaFX screen.

I do not know if the problem I had is enough justification for my proposal, but a scenario is recorded where the current scenario has a certain impact.

Even so, I thank you immensely for JavaCV. It's a great library.

b005t3r commented 5 years ago

If not, I would suggest that this method always return a different instance of BufferedImage.

But why? It makes no sens to always allocate a new image, if the existing one can be reused - it saves memory. If for some reason you need a new instance, simply draw the converted image onto a newly allocated one - problem solved :)

saudet commented 5 years ago

Yes, and that's what cloneBufferedImage() is for.

saudet commented 5 years ago

There's also a couple of copy() methods we can use when we already have a target object: http://bytedeco.org/javacv/apidocs/org/bytedeco/javacv/Java2DFrameConverter.html#copy-org.bytedeco.javacv.Frame-java.awt.image.BufferedImage-

LoveraSantiago commented 5 years ago

ok i'm convinced Thank you

salamanders commented 5 years ago

Also jumping on the thread: sometimes I'd like to do stuff with the pixels, but don't actually need the BufferedImage. Is there any way to support a memory-light (and hopefully CPU-light) callback processor?

Something myFFmpegFrameGrabber.grabImage().processPixels { (x:Int, y:Int, r:UByte, g:Ubyte, b:Ubyte) -> // stuff... } ish.

b005t3r commented 5 years ago

Would be great to have something like that, a graphic context for drawing without fetching data from native memory, but I guess it would be a pain to implement :)

To have something not so memory heavy, you can keep an instance of BufferedImage in the right format and reuese its underlying buffer (so no memory allocation, excpet for one frame, after that the buffer can be reused). Of course, your Frame instance has to be in the right format as well.

I use something like that (modified code I posted above):

    private BufferedImage createBufferedImage(Frame frame, BufferedImage image) {
        ByteBuffer buffer = (ByteBuffer) frame.image[0].position(0);

        if(image == null) {
            ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB);

            ColorModel cm = new ComponentColorModel(cs, false,false, Transparency.OPAQUE, DataBuffer.TYPE_BYTE);
            // this assumes BGR format
            DataBuffer dataBuffer = new DataBufferByte(buffer.limit());
            WritableRaster wr = Raster.createWritableRaster(new ComponentSampleModel(DataBuffer.TYPE_BYTE, frame.imageWidth, frame.imageHeight, frame.imageChannels, frame.imageStride, new int[] {2, 1, 0}), dataBuffer,null);
            byte[] bufferPixels = ((DataBufferByte) wr.getDataBuffer()).getData();

            buffer.get(bufferPixels);

            return new BufferedImage(cm, wr, false, null);
        }
        else {
            WritableRaster wr = image.getRaster();
            byte[] bufferPixels = ((DataBufferByte) wr.getDataBuffer()).getData();

            buffer.get(bufferPixels);

            return image;
        }
    }

And this for putting BufferedImage pixels back into Frame:

    private Frame createFrame(BufferedImage image, Frame frame) {
        DataBufferByte imageBuffer = (DataBufferByte) image.getRaster().getDataBuffer();
        int stride = imageBuffer.getData().length / image.getHeight();

        if(frame == null || frame.imageWidth != image.getWidth() || frame.imageHeight != image.getHeight())
            frame = new Frame(image.getWidth(), image.getHeight(), 8, 3, stride);

        ByteBuffer frameBuffer = (ByteBuffer) frame.image[0].position(0);

        frameBuffer.put(((DataBufferByte) image.getRaster().getDataBuffer()).getData());
        frameBuffer.position(0);

        return frame;
    }
b005t3r commented 5 years ago

The only issue you might have is drawing onto this kind of BufferedImage - it's very slow, much, much slower than to an instance obtained by createCompatibleImage():

    GraphicsDevice gd = GraphicsEnvironment.getLocalGraphicsEnvironment().getDefaultScreenDevice();
    BufferedImage image = gd.getDefaultConfiguration().createCompatibleImage(width, height, Transparency.OPAQUE);

I'm currently experimenting (haven't tested it yet) with doing the drawing with OpenCV drawing functions, so instead of creating BufferedImage I'll be creating a new Mat and drawing to it. Actually, maybe this would be possible even without copying the data back and forth, simply by passing Frame's buffer to one of Mat's constructors... @saudet Would that be possible?

b005t3r commented 5 years ago

Actually this code suggest it might be possible:

https://github.com/bytedeco/javacv/blob/master/src/main/java/org/bytedeco/javacv/OpenCVFrameConverter.java#L172-L177

salamanders commented 5 years ago

I'm just reading data, so I'm happy even if drawing is slow.

saudet commented 5 years ago

@salamanders What you're looking for are indexers: http://bytedeco.org/news/2014/12/23/third-release/

salamanders commented 5 years ago

@saudet reading it... You aren't joking, that is exactly what I was looking for. Thank you!!

salamanders commented 5 years ago

/**
 * Iterate over every pixel.  Good for extracting things, or summing frames into a buffer.
 */
fun Frame.processPixels(f: (x: Long, y: Long, r: Int, g: Int, b: Int) -> Unit) {
    val bgrIdx: UByteIndexer = createIndexer()
    val rows = bgrIdx.rows()
    val cols = bgrIdx.cols()

    // TODO: Coroutines?
    Parallel.loop(0, rows.toInt()) { looperFrom, looperTo, _ ->
        for (rowNum in looperFrom.toLong() until looperTo.toLong()) {
            for (colNum in 0 until cols) {
                // Are we sure this is always BGR?
                val b = bgrIdx.get(rowNum, colNum, 0)
                val g = bgrIdx.get(rowNum, colNum, 1)
                val r = bgrIdx.get(rowNum, colNum, 2)
                f(colNum, rowNum, r, g, b)
            }
        }
    }
    bgrIdx.release()
}
saudet commented 4 years ago

@b005t3r BTW, it's now easier to use the snapshots with Gradle: http://bytedeco.org/builds/ There's also a Gradle plugin we can use to make it even easier: https://github.com/bytedeco/gradle-javacpp#the-platform-plugin So please do send pull requests with your enhancements! Thanks