Video brainstorming - Githubissues

whymarrh commented 9 years ago

(A little bit of this discussion exists on Asana, I'm moving it here.)

I've managed to cobble together a simple program that extracts video frames from the H264 output of the camera we have:

#include <iostream>
#include <cstdio>
#include <fstream>
#include <vector>

extern "C" {
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/avutil.h>
}

int idx = 0;
void process_frame(const AVFrame& frame, const int& x, const int& y)
{
    FILE *file;
    char filename[32];

    sprintf(filename, "images/ppm_frame%03d.ppm", idx++);
    file = fopen(filename, "wb");
    if (file == NULL) {
        return;
    }

    fprintf(file, "P6\n%d %d\n255\n", x, y);
    for (int i = 0; i < y; i++) {
        fwrite(frame.data[0] + i * frame.linesize[0], 1, x * 3, file);
    }
    fclose(file);
}

void decode_frame(AVCodecContext& codec_context, AVFrame& frame, uint8_t& data, int size)
{
    AVPacket packet;
    av_init_packet(&packet);
    packet.data = &data;
    packet.size = size;

    int got_picture;
    int len = avcodec_decode_video2(&codec_context, &frame, &got_picture, &packet);
    if (len < 0) {
        std::cerr << "Error while decoing a frame" << std::endl;
    }

    if (got_picture == 0) {
        return;
    }

    process_frame(frame, codec_context.width, codec_context.height);
}

int update(
    AVCodecContext& codec_context,
    AVCodecParserContext& parser,
    bool& needs_more_data,
    std::vector<uint8_t>& buffer
) {
    needs_more_data = false;
    if (buffer.size() == 0) {
        needs_more_data = true;
        return false;
    }

    uint8_t* data = NULL;
    int size = 0;
    int len = av_parser_parse2(
        &parser, &codec_context,
        &data, &size, &buffer[0], buffer.size(), 0, 0, AV_NOPTS_VALUE);

    if (size == 0 && len >= 0) {
        needs_more_data = true;
        return false;
    }

    if (len) {
        AVFrame* frame = av_frame_alloc();
        decode_frame(codec_context, *frame, buffer[0], size);
        buffer.erase(buffer.begin(), buffer.begin() + len);
        return true;
    }

    return false;
}

int read_buffer(uint8_t* input_buffer, std::vector<uint8_t>& buffer)
{
    int bytes_read = (int) std::fread(input_buffer, 1, 16384, stdin);
    if (bytes_read) {
        std::copy(input_buffer, input_buffer + bytes_read, std::back_inserter(buffer));
    }

    return bytes_read;
}

int main()
{
    uint8_t input_buffer[16384 + FF_INPUT_BUFFER_PADDING_SIZE];
    std::vector<uint8_t> data_buffer;
    av_register_all();

    AVCodec* codec = avcodec_find_decoder(AV_CODEC_ID_H264);
    if (!codec) {
        std::cerr << "Error: cannot find the H264 codec\n";
        return -1;
    }

    AVCodecContext* codec_context = avcodec_alloc_context3(codec);
    if (codec->capabilities & CODEC_CAP_TRUNCATED) {
        codec_context->flags |= CODEC_FLAG_TRUNCATED;
    }

    if (avcodec_open2(codec_context, codec, NULL) < 0) {
        std::cerr << "Error: could not open codec\n";
        return -1;
    }

    AVCodecParserContext* parser = av_parser_init(AV_CODEC_ID_H264);
    if (!parser) {
        std::cerr << "Error: cannot create a H264 parser\n";
        return -1;
    }

    while (1) {
        bool needs_more_data = false;
        while (!update(*codec_context, *parser, needs_more_data, data_buffer)) {
            if (needs_more_data) {
                read_buffer(input_buffer, data_buffer);
            }
        }
    }
}

It's almost useless, but it does illustrate the ability to (quite robustly) extract the video frames from the camera. With this, I think a few options exist:

Write the video decoder in C++ (or C) using libav and wrap it in Java using:
- jnr-ffi;
- JNA;
- or JNI
Write an external program that decodes the video and sends data to the main control software via some form of IPC (anything really, maybe continue using datagrams?)
Something else that is better than what's listed here

whymarrh commented 8 years ago

An example using JCodec to pull frames (as images) from H264 data:

import org.jcodec.codecs.h264.H264Decoder;
import org.jcodec.codecs.h264.MappedH264ES;
import org.jcodec.common.DemuxerTrack;
import org.jcodec.common.NIOUtils;
import org.jcodec.common.model.ColorSpace;
import org.jcodec.common.model.Packet;
import org.jcodec.common.model.Picture;
import org.jcodec.scale.ColorUtil;
import org.jcodec.scale.Transform;

import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.io.File;
import java.nio.ByteBuffer;
import java.util.HashMap;
import javax.imageio.ImageIO;

public class Video {
    private static void toBufferedImageCropped(Picture src, BufferedImage dst) {
        byte[] data = ((DataBufferByte) dst.getRaster().getDataBuffer()).getData();
        int[] srcData = src.getPlaneData(0);
        int dstStride = dst.getWidth() * 3;
        int srcStride = src.getWidth() * 3;
        for (int line = 0, srcOff = 0, dstOff = 0; line < dst.getHeight(); line++) {
            for (int id = dstOff, is = srcOff; id < dstOff + dstStride; id += 3, is += 3) {
                data[id] = (byte) srcData[is];
                data[id + 1] = (byte) srcData[is + 1];
                data[id + 2] = (byte) srcData[is + 2];
            }
            srcOff += srcStride;
            dstOff += dstStride;
        }
    }

    public static void toBufferedImage(Picture src, BufferedImage dst) {
        byte[] data = ((DataBufferByte) dst.getRaster().getDataBuffer()).getData();
        int[] srcData = src.getPlaneData(0);
        for (int i = 0; i < data.length; i++) {
            data[i] = (byte) (srcData[i] + 128);
        }
    }

    public static BufferedImage toBufferedImage(Picture src) {
        if (src.getColor() != ColorSpace.RGB) {
            Transform transform = ColorUtil.getTransform(src.getColor(), ColorSpace.RGB);
            Picture rgb = Picture.create(src.getWidth(), src.getHeight(),
                ColorSpace.RGB, src.getCrop());
            transform.transform(src, rgb);
            src = rgb;
        }

        BufferedImage dst = new BufferedImage(src.getCroppedWidth(),
            src.getCroppedHeight(), BufferedImage.TYPE_3BYTE_BGR);

        if (src.getCrop() == null)
            toBufferedImage(src, dst);
        else
            toBufferedImageCropped(src, dst);

        return dst;
    }

    public static void main(String[] args) throws Exception {
        H264Decoder decoder = new H264Decoder();
        DemuxerTrack videoTrack = new MappedH264ES(NIOUtils.fetchFrom(new File(args[0])));
        Packet packet = null;
        int i = 0;
        while ((packet = videoTrack.nextFrame()) != null) {
            ByteBuffer data = packet.getData();
            Picture buf = Picture.create(720, 480, ColorSpace.YUV420);
            Picture out = decoder.decodeFrame(data, buf.getData());
            ImageIO.write(toBufferedImage(out), "png",
                new File(String.format("image%03d.png", i++)));
        }
        System.out.printf("%d images created%n", i);
    }
}

It might be possible to do something similar to display frames from a stream in a Swing component.

whymarrh commented 8 years ago

More video ideas:

MJPEG

With this we get a stream of pictures (on the simpler end of the implementation-complexity-scale) and we can render each image to the screen as quickly as is practical—this comes with questionable latency.
H.264

We need a decoder for this format. We certainly could try to use something like jcodec, JavaFX, jlibav, etc. but I think they all come with significant downsides. jlibav is relatively old, unmaintained, and lacks documentation; JavaFX and jcodec both don't seem to expose APIs that would allow use to stream data into them*.

* JavaFX supports streaming via HLS only.^[1]
Wrap H.264 in a "proper" streaming protocol

This approach may increase the ease at which we can display H.264 (e.g. we could use JavaFX) but I can't imagine this approach not adding latency. Protocols include but are not limited to RTSP, MPEG-DASH, or HLS (but chances are none of thsese will result in acceptable latency). (Though apparently a latency of below 6 frames, 240 ms at 25 fps, has been achieved with HLS.^[2])

We could also (reaching into the bottom of the proverbial barrel):

Transcode the video on the Pi into another format
Stream H.264 and transcode it on the topside
Stream the raw video data (I've no idea what we could do with these):
- 'yuv' - YUV420 format
- 'rgb' - 24-bit RGB format
- 'rgba' - 32-bit RGBA format
- 'bgr' - 24-bit BGR format
- 'bgra' - 32-bit BGRA format

Admittedly all of the above ideas stem from the continuing assumption that we want control of the video stream (e.g. overlays, start/stop, snapshots, etc.) and to have it "tightly" integrated into the control software.

arandell93 commented 8 years ago

Preamble: I did some research which involves using gstreamer, as these seem to be able to produce low-latency streams at high resolutions according to what I have read. The following are information/resources I was able to find related to this approach. I apologize if some of these things seem complicated/impossible given out setup, I don't understand much of the actual bash/batch/code being implemented.

Alternative Approaches to Video: Admittedly this is not the tightly integrated control we had originally designed for, but I propose that we have two options for giving us the same features from an end-user point of view using this approach that should make life easier for you folks:

1) As we will be having several video steams on one monitor, the configuration of those screens will have an impact of the empty space we have around the edges of the monitor. As has been discussed during topsides design brainstorming, the screen configuration is desired to be as below:

Option 1: This configuration requires that we have the video streaming window in full screen mode with the video streams placed as shown, then having the GUI window open ON TOP of the video streaming window. Since there will be no video behind the GUI, it effectively makes an overlay without us having to mess with any plugins. To the end user this is the same as an overlay.

It is important to note that the ability to move around the video windows is required so that when flying in FWD / AFT mode we can have the corresponding primary camera change. The other two windows do not matter as much, so they could be ignored in terms of switching. So, let's say in this scenario that the top smaller window and the large window would need to be swapped when the pilot requests (via a button on the joystick) a control mode swap from FWD to AFT. To make this easier, I assume that it is easier to have both of those windows being streamed at 1280x720 and just scaled when put in the smaller location, rather than changing the stream resolution of the video on the fly. The other two windows can be steamed at 640x360.

Option 2: This configuration eliminates the need to swap the locations of video streams at the sacrifice of smaller primary windows. It also requires that two GUI windows are created, one for general status information (control modes, power consumption, etc) and the other for payload information.

I propose that we treat this approach (regardless of using option 1 or 2) to video similar to how we treated the HD camera system for Old Polina. That is, the video streaming to the pilots is to be uninterrupted and dedicated. For the purposes of measurement, a second application would be handled by the co-pilot (on the dedicated laptop display) which pulls the video stream at the highest resolution possible (if we can have this application stream 1080p while the other applications stream a different resolution that would be wonderful (and may help with latency to the pilot display), if not then we will deal with a set of 720p streams, though if we use one of the smaller windows for measurement we'll need to have those at 720 streaming and scaled in their window on the pilot display. Similar to OP, the latency on the screen used for measurement would not be important, since it's only going to be used to capture one image (frame) at a time.

The main concern using this approach is whether or not we can have two applications both reading the same video stream. Is this possible? Is is possible to read different resolutions from the same stream (ie: can we stream from the Pi at 1080 and then pick what resolution to decode at for each application without one impacting the other) As a sort of last-resort alternative, the copilot could take a screen shot of the entire second monitor and measure using that image, however, this would only impact the precision when using the smaller stream windows.

Information on streaming with low latency using Gstreamer:

Excellent 'cheat sheet' for how to do various things with Gstreamer (overlay, multiple streams, windowing, etc) http://wiki.oz9aec.net/index.php/Gstreamer_cheat_sheet

More information on setting up the stream from the Rpi: http://raspberrypi.stackexchange.com/questions/26675/modern-way-to-stream-h-264-from-the-raspberry-cam

This video also shows some really promising results using netcat (I don't know what this is, but it may be worth looking into) https://www.youtube.com/watch?v=sYGdge3T30o

https://www.youtube.com/watch?v=lNvYanDLHZA

https://www.raspberrypi.org/forums/viewtopic.php?t=44987&p=356960

http://pi.gbaman.info/?p=150

whymarrh commented 8 years ago

After quite a bit of searching, I found Humble Video, gives us the ability to decode H.264 from Java (see also #13). It is still yet to be determined if we can achieve the latency that we want, esp. considering the inherent overhead in using Swing + Java + JNI. With that, we need to continue exploring the use of an external video player e.g. GStreamer or MPlayer.

The main concern using [an external video player alongside video in the control software] is whether or not we can have two applications both reading the same video stream. Is this possible?

It would be possible, yes. We could have two connections outgoing from each Pi, doubling our outbound traffic, or have a proxy of sorts on the topside that hands off the stream to two different applications. It would need a bit of thought, but it is certainly possible in some form. Though I had previously said so, it would not be dependent on our choice of transport protocol.

Is is possible to read different resolutions from the same stream (ie: can we stream from the Pi at 1080 and then pick what resolution to decode at for each application without one impacting the other)

More research is needed on this—it's an interesting question. As a poorer alternative we could downsample (?) the 1080p data.

As a sort of last-resort alternative, the copilot could take a screen shot of the entire second monitor and measure using that image, however, this would only impact the precision when using the smaller stream windows.

We would need to figure out how this may impact image correction should we need to do it.

whymarrh commented 8 years ago

I just noticed that MPlayer in turn uses FFmpeg,^[1] furthering my suspicion that our largest contributor to video latency is the rendering pipeline (and using/not using hardware acceleration where possible).

http://www.mplayerhq.hu/design7/news.html

Mplayer 1.2 is compatible with the recent FFmpeg 2.8 release. The tarball already includes a copy of FFmpeg, so you don't need to fetch it separately.

whymarrh commented 8 years ago

Does the input latency play a role in this discussion? We haven't quantified it yet, but I'm wondering if the time it takes for us to read joystick input, send it to the Pi, and have the pi write those values to the thrusters affects the output (video) ideal latency? Does input and output need to be synchronized?

cal-pratt commented 8 years ago

Yeah, that's a fair point. Right now the latency would a sum of:

the joystick polling speed. > ~10ms
udp event latency. > ~3ms ...just ballparking. need to verify this
the sleep duration in the SystemManager. currently 100ms but could be easily adjusted to >10ms.

So I'd say the system latency is 25ms tops.

Note: the ping latency in a wired LAN system is usually > ~500 microseconds. So the Java overhead on the UDP event really shouldn't be much more than a millisecond or two

arandell93 commented 8 years ago

My suspicion is that the Arduino's looping rate will be the largest factor in the input latency. Depending on the number of devices we're communicating with, it may require some major optimization. Devices include: Analog Input reads (Up to 7ish right now) DIgital Input Reads (Up to 4 or 5) TTL to Motor Controllers - 4 devices I2C to IMU (3 devices: Accel/Magnetometer, Gyro and Pressure Sensor). I2C to 3 DC/DC Converters (Approx 10 words to read back, 2 to write) I2c to 6 thrusters (5 values to read, 1 to write per thruster)

whymarrh commented 8 years ago

I've spent a bit of time crudely profiling the video components in #13 in an attempt to track down latency, and have come to the conclusion that 300ms is nearing the lower bound of the setup (if it isn't the lower bound itself). The paint times for a frame are ~18ms and it takes about ~30ms to decode a frame—the latter of which is particularly rigid and outside of our control (in that setup). The OpenGL property cut render times nearly in half to get us to the 18ms from ~40ms. With that, I don't feel that the paint times are the bottleneck.

With the full mission specification released we know that we won't need software measurement. That said, I think that there is still a case to be made for an integrated video player, since it will allow us easier setup/installation, complete control over starting/stopping the feed without needing to restart the player, control over changing resolutions, etc.

arandell93 commented 8 years ago

I agree with your sentiments regarding this year's scope. While it would be nice to have an integrated player, if we can't get the latency we want at least we know we're okay for this year to use an external application and get very low latency. I'd suggest we keep looking for improvements, but it's definitely no longer a priority and we should focus on streamlining the ROV control stuff now, particularly optimizing the Arduino's 'scan time'.

whymarrh commented 8 years ago

Some info here might be of interest: raspberrypi/userland#243

arandell93 commented 8 years ago

My understanding of this seems to be that with a 1080p30 video feed the Pi has an inherent latency of somewhere just under 100ms. (Though these numbers seemed to be pulled out of a hat, so their validity is questionable at best). The other latency is introduced through our network/topsides, so it's important that we keep those as low as possible. I suspect we will be capable of this and should see improvements over what we have as yet seen.

It should be noted that the VGA upsizing talked about in that thread when referring to 720p60 as having lower latency is not valid for us, since we need to capture 1080 in order to get a wide (as possible) FOV. We're already losing out on FOV from the croping of 5MP (native sensor res) down to the ~2MP that is 1920x1080.

arandell93 commented 8 years ago

However, it may be required that we down-sample the 1080p signal for display on our topsides, since we won't have the full screen for one video feed. I assume this won't add any noticeable latency but it's going to need to be tested.

whymarrh commented 8 years ago

I'm going to close this in favour of #41 and #82—I think we're on a solid path with mpv and raspivid. Let's also open up a new issue and narrow the scope to UI-related discussion for the video windows.

EasternEdgeRobotics / Software_2017

Video brainstorming #2