Very High Latency with FFmpegFrameGrabber.setVideoFrameNumber(lastFrame)

jainvijay commented 3 years ago

I am trying to grab the last frame of a video and when I set the video frame number to the last frame, it has to a really huge latency.

2021-09-08 12:18:02.154 [main] INFO VideoFrameExtractor#getBufferedImageWithSize() - Latency for setVideoFrameNumber = 123689 ms

try (FFmpegFrameGrabber frameGrabber = new FFmpegFrameGrabber(mediaFile)) {
    frameGrabber.startUnsafe();
    int lastFrameIndex = frameGrabber.getLengthInVideoFrames() - 1;
    frameGrabber.setVideoFrameNumber(lastFrameIndex);
    Frame frame = frameGrabber.grabImage();
    frameGrabber.releaseUnsafe();
} catch(Exception e) {
    log.error("Failed to extract FrameImages");
}

The problematic video: https://gcdn.2mdn.net/videoplayback/id/75efee055f336ba7/itag/15/source/doubleclick_dmm/ctier/L/ip/0.0.0.0/ipbits/0/expire/3764707174/sparams/id,itag,source,ctier,ip,ipbits,expire/signature/B6C6DC0D594223A684DA84FEB4235A503AFA5DD1.A007DA81FEF13E870A626B4F3967237959BF8561/key/ck2/file/file.mov

I am currently using version 1.5 but I have also tried it latest version 1.5.6, it still doesn't work. Can someone help me debug this issue.

ffprobe file.mov -show_streams  -select_streams v  -print_format json
ffprobe version 4.3.2 Copyright (c) 2007-2021 the FFmpeg developers
  built with Apple clang version 12.0.0 (clang-1200.0.32.29)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.2 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
{
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'file.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 537134592
    compatible_brands: qt
    creation_time   : 2021-05-05T11:30:45.000000Z
    com.apple.quicktime.software: Telestream Media Framework - Main 2018.5814.0x0b6fd52e
    timecode        : 01:00:00:00
  Duration: 00:00:30.03, start: 0.000000, bitrate: 31202 kb/s
    Stream #0:0: Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 31001 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      encoder         : AVC
    Stream #0:1(eng): Audio: aac (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s (default)
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74), 0 kb/s (default)
    Metadata:
      handler_name    : Time Code Media Handler
      timecode        : 01:00:00:00
Unsupported codec with id 0 for input stream 2
    "streams": [
        {
            "index": 0,
            "codec_name": "h264",
            "codec_long_name": "H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10",
            "profile": "Main",
            "codec_type": "video",
            "codec_time_base": "1001/48000",
            "codec_tag_string": "avc1",
            "codec_tag": "0x31637661",
            "width": 1920,
            "height": 1080,
            "coded_width": 1920,
            "coded_height": 1088,
            "closed_captions": 0,
            "has_b_frames": 1,
            "sample_aspect_ratio": "1:1",
            "display_aspect_ratio": "16:9",
            "pix_fmt": "yuv420p",
            "level": 50,
            "color_range": "tv",
            "color_space": "bt709",
            "color_transfer": "bt709",
            "color_primaries": "bt709",
            "chroma_location": "left",
            "field_order": "progressive",
            "refs": 1,
            "is_avc": "true",
            "nal_length_size": "4",
            "r_frame_rate": "24000/1001",
            "avg_frame_rate": "24000/1001",
            "time_base": "1/24000",
            "start_pts": 0,
            "start_time": "0.000000",
            "duration_ts": 720720,
            "duration": "30.030000",
            "bit_rate": "31001075",
            "bits_per_raw_sample": "8",
            "nb_frames": "720",
            "disposition": {
                "default": 1,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0
            },
            "tags": {
                "encoder": "AVC"
            }
        }
    ]
}

saudet commented 3 years ago

If the file has a broken index, it can take a while to seek to the last frame. Please make sure that's not the case.

jainvijay commented 3 years ago

Thank @saudet for such a quick reply. The video runs fine, can you explain me what do you mean by broken index? Is there a way to identify if the file has a broken index before we seek?

saudet commented 3 years ago

Is there a way to identify if the file has a broken index before we seek?

I'm not sure. @anotherche Would you know?

anotherche commented 3 years ago

I'm not sure. @anotherche Would you know?

no ((

jainvijay commented 3 years ago

Hi @saudet and @anotherche,

Really appreciate you guys helping out on this one.

Ran this command to get more details around frames.

ffprobe -loglevel panic -select_streams v -show_entries "frame=pkt_pts,pkt_pts_time,pkt_duration,best_effort_timestamp,best_effort_timestamp_time" -read_intervals %+#900 file.mov

I can see 720 frames and their respective times https://gist.github.com/jainvijay/5262e1cd314e2f121f95922066d889a1

When we say the indexing is broken, can you point me in a direction what is so different about this video than a working one.

jainvijay commented 3 years ago

Also ran the ffmpeg command to capture frame which also seeks to that point. This seek works perfectly fine. FFMpeg doesn't have any problem seeking to that time.

The frames are always null using sdk, it keep retrying, but we can get frames by running command

ffmpeg -i file.mov -ss 00:00:29 -pix_fmt yuv420p -frames:v 1 out1.jpg -v trace ffmpeg -i file.mov -vf "select=not(mod(n\,10))" -vsync vfr img_%03d.jpg

jainvijay commented 3 years ago

Another interesting debugging information, when I download the entire file and run the code over it, It works fine.

But when I give the url, it doesn't work. When its trying seek and get partial response in bytes, its making a lot of http calls to seek.

saudet commented 3 years ago

That sounds like an issue with your HTTP server.

jainvijay commented 3 years ago

Also if I change it to from

FFmpegFrameGrabber frameGrabber = new FFmpegFrameGrabber(mediaFile) to

FFmpegFrameGrabber frameGrabber = new FFmpegFrameGrabber(new URL(mediaFile).openStream())

It works as well

saudet commented 3 years ago

With an InputStream, it downloads the whole file, which is probably not what you want to do.

jainvijay commented 3 years ago

Yeah, That won't be good for performance.

These are the headers on the problematic file. anything suspicious which will cause problems with seeking?

➜  Downloads   curl -i https://gcdn.2mdn.net/videoplayback/id/75efee055f336ba7/itag/15/source/doubleclick_dmm/ctier/L/ip/0.0.0.0/ipbits/0/expire/3764707174/sparams/id,itag,source,ctier,ip,ipbits,expire/signature/B6C6DC0D594223A684DA84FEB4235A503AFA5DD1.A007DA81FEF13E870A626B4F3967237959BF8561/key/ck2/file/file.mov
HTTP/1.1 302 Found
Date: Mon, 13 Sep 2021 19:04:57 GMT
Pragma: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Cache-Control: no-cache, must-revalidate
X-Content-Type-Options: nosniff
Location: https://r5---sn-p5qlsndd.c.2mdn.net/videoplayback/id/75efee055f336ba7/itag/15/source/doubleclick_dmm/ctier/L/ip/0.0.0.0/ipbits/0/expire/3764707174/sparams/ctier,expire,id,ip,ipbits,itag,mh,mip,mm,mn,ms,mv,mvi,pl,source/signature/4FD69BEE28DB47C9D940697DCC13FB30F1907E48.16C15385610B55536BDECFCCFCF3EA8F0F626463/key/cms1/cms_redirect/yes/mh/AT/mip/2a00:1288:ef6a:32::100d/mm/42/mn/sn-p5qlsndd/ms/onc/mt/1631557224/mv/u/mvi/5/pl/54/file/file.mov
Content-Type: text/html; charset=UTF-8
Server: ClientMapServer
Content-Length: 640
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://r5---sn-p5qlsndd.c.2mdn.net/videoplayback/id/75efee055f336ba7/itag/15/source/doubleclick_dmm/ctier/L/ip/0.0.0.0/ipbits/0/expire/3764707174/sparams/ctier,expire,id,ip,ipbits,itag,mh,mip,mm,mn,ms,mv,mvi,pl,source/signature/4FD69BEE28DB47C9D940697DCC13FB30F1907E48.16C15385610B55536BDECFCCFCF3EA8F0F626463/key/cms1/cms_redirect/yes/mh/AT/mip/2a00:1288:ef6a:32::100d/mm/42/mn/sn-p5qlsndd/ms/onc/mt/1631557224/mv/u/mvi/5/pl/54/file/file.mov">here</A>.
</BODY></HTML>

➜  Downloads   curl -i https://r5---sn-p5qlsndd.c.2mdn.net/videoplayback/id/75efee055f336ba7/itag/15/source/doubleclick_dmm/ctier/L/ip/0.0.0.0/ipbits/0/expire/3764707174/sparams/ctier,expire,id,ip,ipbits,itag,mh,mip,mm,mn,ms,mv,mvi,pl,source/signature/4FD69BEE28DB47C9D940697DCC13FB30F1907E48.16C15385610B55536BDECFCCFCF3EA8F0F626463/key/cms1/cms_redirect/yes/mh/AT/mip/2a00:1288:ef6a:32::100d/mm/42/mn/sn-p5qlsndd/ms/onc/mt/1631557224/mv/u/mvi/5/pl/54/file/file.mov
HTTP/1.1 200 OK
Last-Modified: Wed, 05 May 2021 23:59:29 GMT
Content-Type: application/octet-stream
Date: Mon, 13 Sep 2021 19:05:30 GMT
Expires: Mon, 13 Sep 2021 19:05:30 GMT
Cache-Control: private, max-age=86400
Accept-Ranges: bytes
Content-Length: 117124548
Connection: close
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
Vary: Origin
X-Content-Type-Options: nosniff
Server: gvs 1.0

jainvijay commented 3 years ago

Trace logs where we continuously sending 206: https://gist.github.com/jainvijay/850285f858ce9d0d440fd81d01d1d733

jainvijay commented 3 years ago

is there a way to have timeout around setVideoFrameNumber function?

saudet commented 3 years ago

FFmpeg supports a timeout for most protocols, yes: https://github.com/bytedeco/javacv/blob/master/samples/FFmpegStreamingTimeout.java

kshitijgarg2609 commented 3 years ago

I am also getting very high latency, I am sharing my code here pls do check and help me how to correctify the latency.

-----VideoStreamTest.java---------

import org.bytedeco.javacv.*;
import javax.sound.sampled.*;
import java.nio.*;
import java.util.concurrent.*;
class VideoStreamTest
{
static BufferedImageDisplayer bid = new BufferedImageDisplayer();
public static void main(String args[])throws Exception
{
AudioFormat audioFormat = new AudioFormat(44100, 16, 1, true, true);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
SourceDataLine soundLine = (SourceDataLine) AudioSystem.getLine(info);
soundLine.open(audioFormat);
soundLine.start();
FFmpegFrameGrabber ff = new FFmpegFrameGrabber("Dj Quads - Wonderful World.mp4");
//System.out.println("Length in frames : "+ff.getLengthInFrames());
ff.start(false);
System.out.println("Length in frames : "+ff.getLengthInFrames());
boolean x=true;
while(x)
{
Frame f = ff.grabFrame();
//Frame img = ff.grabImage();
//Frame ad = ff.grabSamples();
if(f==null)
{
break;
}
System.out.println(ff.getFrameNumber());
if(f.image!=null)
{
bid.updateFrame(Java2DFrameUtils.toBufferedImage(f));
}
if(f.samples!=null)
{
ShortBuffer bb_s = (ShortBuffer)f.samples[0];
bb_s.rewind();
ByteBuffer bb = ByteBuffer.allocate(2*bb_s.capacity());
for(int i=0;i<bb_s.capacity();i++)
{
bb.putShort(bb_s.get());
}
byte aud[] = bb.array();
soundLine.write(aud,0,aud.length);
}
}
}
}

--------BufferedImageDisplayer.java----------------

import java.awt.*;
import java.awt.image.*;
import javax.imageio.*;
import javax.swing.*;
class BufferedImageDisplayer
{
JFrame jf;
JLabel img;
Dimension dim;
int w,h;
BufferedImageDisplayer()
{
jf=new JFrame("BufferedImageTester");
dim=(Toolkit.getDefaultToolkit()).getScreenSize();
w=(int)(dim.getWidth());
h=(int)(dim.getHeight());
jf.setBounds((w-670)/2,(h-380)/2,800,420);
jf.setResizable(false);
jf.setLayout(null);
jf.setDefaultCloseOperation(jf.EXIT_ON_CLOSE);
jf.setVisible(true);
initializ();
jf.revalidate();
jf.repaint();
}
void initializ()
{
int jw=jf.getWidth();
int jh=jf.getHeight();
img=new JLabel();
img.setBounds((jw-640)/2,((jh-320)/2)-10,640,320);
img.setOpaque(true);
img.setBackground(Color.cyan);
jf.add(img);
}
BufferedImage scaledImage(Image a,int w,int h)
{
BufferedImage b;
Graphics2D g2;
b=new BufferedImage(w,h,BufferedImage.TYPE_INT_ARGB);
g2=b.createGraphics();
g2.setRenderingHint(RenderingHints.KEY_INTERPOLATION,RenderingHints.VALUE_INTERPOLATION_BILINEAR);
g2.drawImage(a,0,0,w,h,null);
g2.dispose();
return b;
}
void updateFrame(BufferedImage icon)
{
icon=scaledImage(icon,640,320);
img.setIcon(new ImageIcon(icon));
}
}

anotherche commented 3 years ago

Hi, @kshitijgarg2609! I'm currently working on various issues with frame grubbing/seeking functions. May be this will help, when the improvements will be ready.

saudet commented 3 years ago

@kshitijgarg2609 For performance, please do not use BufferedImage.

anotherche commented 3 years ago

BufferedImage

Which is better then?

saudet commented 3 years ago

WritableImage from JavaFX is good: https://github.com/rladstaetter/javacv-webcam

Skyterix commented 2 years ago

Hi, @kshitijgarg2609! I'm currently working on various issues with frame grubbing/seeking functions. May be this will help, when the improvements will be ready.

Any ETA?

anotherche commented 2 years ago

Perhaps next week, if I can find enough time.

anotherche commented 2 years ago

OK. I did a lot of research into the issue that started this discussion and realized that the changes to the time setting code that I am currently working on do not solve the original problem (the need for these changes was caused by another issue).

Also ran the ffmpeg command to capture frame which also seeks to that point. This seek works perfectly fine. FFMpeg doesn't have any problem seeking to that time.

The frames are always null using sdk, it keep retrying, but we can get frames by running command

ffmpeg -i file.mov -ss 00:00:29 -pix_fmt yuv420p -frames:v 1 out1.jpg -v trace

Actually, FFmpeg does have problems with seeking in this file. This command makes no seeking actually, it reads the file from the very beginning but discards everything up to the specified time. Here is the correct command doing the seek, for example (-ss should be before the input)

ffmpeg -ss 00:00:25.067 -i file.mov file.mp4

with this timestamp you will get “Too many packets buffered for output stream 0:1.577014:32:22.77”, while specifying 25.066 seconds will give you no error and will encode last 5 seconds of the file. So, what’s happen? There are 720 frames in this file, of which 6 are keyframes (I-frames: 0, 120, 206, 447, 529 and 589). It turned out that avformat_seek_file function (FFmpeg’s api function used for seeking in ffmpeg cli and in FFmpegFrameGrabber) works correctly only when we set the frame number <601 in this file (then avformat_seek_file sets the position to the nearest of the previous keyframes). However, starting at frame 601 (that 25.067 timestamp above), this function cannot select the correct keyframe. Instead, the read position is shifted to the end of the file for some reason and, as a result, no more frames can be read. But at the same time, if you continuously read frames from any position <601, the video is read without errors until the very end. Perhaps the file is corrupted; perhaps it is a bug in ffmpeg that appears on this file. Suspicious - the file contains a third track with data (QT timecode), which cannot be removed with a remux or transcode (there are posts of various problems with ffmpeg and the QT timecode track on the Internet). I believe that it makes no sense to solve this problem by changing the FFmpegFrameGrabber code, since this is a special case. At the level of user code, such cases can be solved by shifting the seeking point (100, 200 ... frames back, until the moment when the seek is successful) and then read frame by frame to the required timestamp. Actually, the code for setting timestamp in FFmpegFrameGrabber now already provides for an offset of 0.5 seconds backward. Basically, we could make this offset customizable. Then, for example, before calling setTimestamp method, user could increase this initial offset and check if this is enough. Them we could make this offset automatically reset to default (so that further seeking do not require redundant reading of frames). Not sure how rational this is.

This is not the end )), because the described problem is related to an even more general problem. The fact is that neither the functions that return the duration of the video (or audio) or the number of frames, by definition, do not guarantee exact values! At best, these are estimates (video may be slightly shorter or longer than audio, information about the duration of individual streams may not be available, frame rates may be variable, and so on). At worst, the information may be initially erroneous in files. Therefore, a straightforward attempt to seek to the last frame is likely to be unsuccessful. It is more correct to seek to a slightly earlier frame and then read frames to the end of the stream. Of course, this is a more time consuming operation than a simple seek, but in the case of the last frame, you always have to perform a little more operations. Perhaps the method used in ffprobe program can be used to solve this problem. It allows you to relatively quickly count packets in video and audio streams (without decoding). Then you can more confidently know the number (and possibly the time) of the last video / audio frame. For example:

ffprobe -v error -select_streams v:0 -count_packets -show_entries stream=nb_read_packets -of csv=p=0 file.mov

shows 720 packets for video stream (it is always 1 packet=1 frame for video stream). This can be used in different ways - we can use ffprobe directly, we can adapt the corresponding code from ffprobe into user code. Probably we can even add the appropriate methods to FFmpegFrameGrabber, although again the question arises how rational it is.

Skyterix commented 2 years ago

I want to add that this problem is with almost every other video file I have. I can get the first frame out of them without calling setFrameNumber but if I try to change the frame it starts hogging the CPU like crazy, even for files like 10MB, I even let it run for a while and nothing is changed. It is a huge problem, I can't use the first frame as some videos have a transition from black to the video itself and all I get is a black image.

The problem does not happen when I use ffmpeg command: ffmpeg -i 'file.mov' -vf "select=eq(n\,75)" -vframes 1 out.png I just get the image in less than a second.

Even the file in this issue works fine with this command.

anotherche commented 2 years ago

I want to add that this problem is with almost every other video file I have. I can get the first frame out of them without calling setFrameNumber but if I try to change the frame it starts hogging the CPU like crazy, even for files like 10MB,

Hmm ... maybe most of your files are corrupted? )) I ran the test on files in one folder (just a collection of 229 movies). I made 10 seek to random frames in each file (with grabImage() after each seek). One of the files appeared corrupted (seek took about 4 seconds on average, video players freeze on changing position in it). The rest of the files have an average transition time of 86 ms. Here are the histograms of the average seek time in a file (for all files, and for the normal ones only). Made using 1.5.6 (although, it is not the release, it is a snapshot with some seek precision improvements) There are 115 avi files, 57 mkv and 57 mp4.

anotherche commented 2 years ago

I just tried to count packets in a video without decoding (using grabPacket() in a loop), just like in ffprobe. Unfortunately, it seems that av_read_frame implementation through javacv is about 10 times slower than the native code in ffprobe. Packages in a test file with 68000 frames are counted about 10 seconds if I use grabPacket() in a loop. ffprobe counts them in about 1 second. What is the reason? JNI?

anotherche commented 2 years ago

Reduced these 10 seconds to 2 by using PointerScope ))

saudet commented 2 years ago

grabPacket() is still using deprecated API, so it's possible it's not the best way to do that, see #818.

Skyterix commented 2 years ago

I want to add that this problem is with almost every other video file I have. I can get the first frame out of them without calling setFrameNumber but if I try to change the frame it starts hogging the CPU like crazy, even for files like 10MB,

Hmm ... maybe most of your files are corrupted? )) I ran the test on files in one folder (just a collection of 229 movies). I made 10 seek to random frames in each file (with grabImage() after each seek). One of the files appeared corrupted (seek took about 4 seconds on average, video players freeze on changing position in it). The rest of the files have an average transition time of 86 ms. Here are the histograms of the average seek time in a file (for all files, and for the normal ones only). Made using 1.5.6 (although, it is not the release, it is a snapshot with some seek precision improvements) There are 115 avi files, 57 mkv and 57 mp4.

It may be that my files are corrupted but it still doesn't explain why I can't get other than the first frame without it hogging CPU but can do it with a terminal command in less than a second.

anotherche commented 2 years ago

The problem does not happen when I use ffmpeg command: ffmpeg -i 'file.mov' -vf "select=eq(n\,75)" -vframes 1 out.png I just get the image in less than a second.

Even the file in this issue works fine with this command.

I guess that the answer is that the command does no seek, but rather read packets sequentially from the beginning and process only specified (75). You could share an example file.

Skyterix commented 2 years ago

The problem does not happen when I use ffmpeg command: ffmpeg -i 'file.mov' -vf "select=eq(n\,75)" -vframes 1 out.png I just get the image in less than a second. Even the file in this issue works fine with this command.

I guess that the answer is that the command does no seek, but rather read packets sequentially from the beginning and process only specified (75). You could share an example file.

Okay, while I was preparing my code to send it here I realized what I did wrong. I called setFrameNumber instead of setVideoFrameNumber. It probably started seeking the whole file. I will still share it cause CPU hogging on setFrameNumber shouldn't happen. I am using Ryzen 5900x and let it run 5 min on 156 KB file and it still didn't finish...

It appears any file recorded from OBS is just breaking this.

https://github.com/Skyterix1991/frame_grabber_test

anotherche commented 2 years ago

Okay, while I was preparing my code to send it here I realized what I did wrong. I called setFrameNumber instead of setVideoFrameNumber.

Well, then I suspect that the reason can be data frame processing added in grabFrame since 1.5.3. I think so because setFrameNumber use old simple algorithm of setTimestamp method, whereas setVideoFrameNumber use the new one. But since 1.5.3 new variable is added to grabFrame method's signature (doData) and since that release the old algorithm of setTimestamp calls grabFrame with doData=true by default (while setVideoFrameNumber use the new code in setTimestamp where doData=false specified explicitly). To check it, please build your code with, say, 1.5 (and with setFrameNumber) and check if the problem remains. If the problem with high CPU load / high latency disappears, then we will need to carefully consider the part of grabFrame code responsible for processing data frames.

Skyterix commented 2 years ago

me processing added in grabFrame since 1.5.3. I think so

1.5 works fine with the same code.

anotherche commented 2 years ago

grabPacket() is still using deprecated API, so it's possible it's not the best way to do that, see #818.

As far as I can see, grabPacket() not using any deprecated API. recordPacket() uses various deprecated things, as well as FFmpegFrameRecorder's startUnsafe. So, I believe we can use grabPacket() to count frames (this could be used as optional possibility to refine length of streams)

anotherche commented 2 years ago

me processing added in grabFrame since 1.5.3. I think so

1.5 works fine with the same code.

OK. Then I am sure that the reason is data frames processing in grabeFrame called with doData=true while seeking.

anotherche commented 2 years ago

me processing added in grabFrame since 1.5.3. I think so

1.5 works fine with the same code.

OK. Then I am sure that the reason is data frames processing in grabeFrame called with doData=true while seeking.

nope, this is not because of the doData=true itself, but because of some different reasons in grabFrame code. At least, if more than one of doAudio, doVideo and doData are true in grabFrame called from setTimestamp then seeking in most files causes high CPU load and latency problem. The only setting not causing the problem at all is when only doVideo is true.

anotherche commented 2 years ago

So, it looks like I have localized the bug in the grabFrame method. The fact is that if, after the previous call of grabFrame, any packet (audio, video or data) was read (which always happens after a successful seek, for example), then if the next grabFrame asks for a specific frame (video or audio) that is different from the previous read frame, then grabFrame enters an infinite loop in which it waits for a new packet to arrive, but does not read it, since it always remains readPacket = false. Try to trace by the code what, for example, will happen if, after the audio frame has been read, you try to request a video frame. You will get an infinite loop waiting for the packet. Directive boolean readPacket = pkt.stream_index () == -1; in the beginning of grabFrame looks incorrect, since at this point in the code we can only expect to receive a new packet (all the old data could already be used above, given videoFrameGrabbed and audioFrameGrabbed). If I simply change the boolean readPacket = pkt.stream_index () == -1; to boolean readPacket = true; then the code starts working without problems. And then, given the fact that readPacket can only be set false in the single place here, I don't understand why this variable was introduced at all. However, it seems to me that the last part of grabFrame (that for handling doData) is not written correctly. The code is such that any packet, including video and audio, can be used as data in the current implementation.

saudet commented 2 years ago

pkt.stream_index == -1 is just a flag to indicate that the packet is empty: https://github.com/bytedeco/javacv/blob/master/src/main/java/org/bytedeco/javacv/FFmpegFrameGrabber.java#L716 If the packet is not empty, then it means we should try to use it. This happens with audio frames, where a packet can contain multiple frames. There are most likely logic errors in there when I had to remove the deprecated API in there and come up with something compatible though.

anotherche commented 2 years ago

OK. Then it should be boolean readPacket = !(doAudio && pkt.stream_index () == audio_st.index()); In this case the previous packet will only be used if it contains audio and we are requesting for audio frame. In other cases new packet will be read. But the part with doData is incorrect anyway. If we are requesting data frame only (only doData is true), grabFrame will simply return us a next read packet which may be of any type, video, audio (including possible other audio stream), data (and possible other data stream as well). So we should add stream type check to the doData conditional part.

saudet commented 2 years ago

No, that won't work, it will cause memory leaks. We need to deallocate packets that are not needed anymore.

anotherche commented 2 years ago

Isn't it already done by?

if (readPacket) {
                if (pkt.stream_index() != -1) {
                    // Free the packet that was allocated by av_read_frame
                    av_packet_unref(pkt);
                }
...

saudet commented 2 years ago

Hum, ok, sounds like it could work, yes.

And if you know how to fix doData as well, please do!

saudet commented 2 years ago

BTW, the new API of FFmpeg makes it sound like it's now possible to have more than one image per packet, so we should probably not assume that we'll always have a single image per video packet.

anotherche commented 2 years ago

I stopped for a while due to being busy. But I'll take care of it.

anotherche commented 2 years ago

I am almost done with the code fix, but we need to choose how to handle doData parameter. FFmpeg defines 6 types of streams:

AVMEDIA_TYPE_VIDEO = 0,
AVMEDIA_TYPE_AUDIO = 1,
AVMEDIA_TYPE_DATA = 2,
AVMEDIA_TYPE_SUBTITLE = 3,
AVMEDIA_TYPE_ATTACHMENT = 4,
AVMEDIA_TYPE_NB = 5;

So, we have two possibilities of using the doData flag: to output only streams with AVMEDIA_TYPE_DATA, or to output any streams other than video or audio. I tend to the latter option, since all these not video/audio streams are in some sense additional data that do not have standard decoders, and which can only be dealt with by the users themselves, who need such data for their own reasons. When receiving frames from such "data" streams, one can always determine the type of stream by its index and use it accordingly, or ignore it. How do you think?

saudet commented 2 years ago

@bytes-and-bits What do you think?

saudet commented 2 years ago

It feels to me like grabPacket() isn't too different from what grab(doData) currently does, that is, simply return the data from any packet. It would be nice if we could somehow harmonize those two APIs...

anotherche commented 2 years ago

I cannot agree with this)) First, grab(doData) is not for an arbitrary packet. Second, it works as part of the method that outputs decoded frames with different content (including the data type), while grabPacket outputs a packet as is. The latter can be (and is) used, for example, for remuxing without re-encoding streams, and which I am also working on to improve, replacing the use of obsolete methods in the FFmpegFrameRecorder. So I see these as two methods needed to achieve qualitatively different goals.

Additionally, after pondering the previous question, I realized that it is better for the time being to return exactly AVMEDIA_TYPE_DATA packets for doData requests. Because subtitles, in theory, should still be decoded by means of ffmpeg API, and what these attachments and NB are - I don't understand at all. Let us solve problems one by one - if someone needs processing of subtitles (or attachments, or NB) - we will add this.

saudet commented 2 years ago

We can leave grabPacket() and recordPacket(), that's fine, but it exposes the AVPacket data structure, which is specific to FFmpeg. For the parent FrameGrabber and FrameRecorder, we can easily return the data of the packet like it's already pretty much gets done in Frame.data, and we can put a reference to the original AVPacket in Frame.opaque too. So it sounds like we should have a "packet mode" or something, and then have a Frame.type field that tells us what kind of frame/packet this data came from. The "data" would only be set when the packet isn't decoded. When it's decoded, we'd set image, samples, subtitles (when that gets done), etc, but until decoding subtitles and what not gets done, at least a user could access the raw data of the packet by grabbing in "packet mode". What do you think?

anotherche commented 2 years ago

OK. At first glance, this sounds similar to the option I suggested at the beginning. That is, when calling the grabFrame(doData true)method, the method returns the packet data as a Frame.data field if this packet is not related to video or audio streams. That is, as I understand it, you are not proposing to change anything in grabPacket method, but you would like to add a Frame.type field into which we could record the type of frame (moreover, not just video, audio, etc., but ffmpeg_video, ffmpeg_audio, ffmpeg_data, DC1394_something, FlyCapture_something, etc.), and so that the grabFrame would return a frame with the type specified in this field, and if the data was not decoded, also set the Frame.date field (= packet data) and Frame.opaque (= AVPacket). Did I understand your idea correctly?

Or maybe you meant that we could add a completely packet-like mode to the grabFrame method, in which decoding will not be performed at all (as in the grapPacket), but which returns a Frame instance with the fields set as described above (however, if it is a packet from video or audio streams, then it may contain several real frames inside). But this option seems to me too redundant for its implementation through the grabFrame (won't there be too many parameters, considering that later it will probably be necessary to add doSubtitles as well). But if this is exactly what we would like to do, wouldn't it be easier to create a method like grabPacketAsFrame or grabRawFrame?

saudet commented 2 years ago

I'm honestly not sure what would be best, but one thing for sure, let's keep backward compatibility. That means, let's make sure we return all packet data in Frame.data like it's doing right now by default, then let's add a Frame.type to add that information there. That won't break backward compatibility, but still allow users to differentiate between different data frames. That field should be of type Frame.Type and should not contain backend specific information. Let's leave that in Frame.opaque, that's what that field is for. How does that sound for starters? If we need to do more, let's think about those other things at that point in time.

bytedeco / javacv

Very High Latency with FFmpegFrameGrabber.setVideoFrameNumber(lastFrame) #1689