Closed reinhrst closed 10 months ago
I am aware of this issue, but have no solution either. This would require parsing packets, and that's quite a heavyweight behavior. My "solution" for now has been "keep wandering through keyframes until you get to one VideoDecoder doesn't choke on". Personally, I'm with FFmpeg on this one; recovery frames are keyframes, that's what recovery is. But, my personal opinion on VideoDecoder being lame doesn't change it ;)
(And, luckily, VideoDecoder in both Chrome and Safari don't behave incorrectly if you overspecify keyframes except for the first frame. So, once you've got VideoDecoder into a steady state, this marking doesn't matter.)
I see if I can get this other ticket moving again a bit, by adding the information that libav is not able to distinguish between an IDR frame and an I + recovery frame. One of the "promises" of Webcodecs is after all that you could use something like libav to demux.... It also seems to me that changing the behaviour (i.e. allow I + recovery frames as start frame) is not much more than a spec change; the h264 decoders in browsers (at least the ones used for the HTMLVideoElement
) can start just fine on an I + recovery frame....
Random idea I haven't tried, what if you:
* try catch around decoder.decode()
VideoDecoder
doesn't throw exceptions, it just calls an error callback (and then stops working).
If this technique works, the practical way to use it would be to always send the "known extra-key-flavored" keyframe every time you seek and just ignore the first output frame.
In my experience chrome actually throws this specific error synchronously. Which is weird given the async callback but that's what happened in my experiments. Are you seeing differently? Maybe chrome changed their behavior since I tried it.
The spec specifies that decoder.decode()
should throw an error if a key frame is expected and not received (and it's my experience on Chrome that it does this). Note that it's only a SHOULD and not a MUST in the spec (i.e. if chunk.type !== "key"
it MUST throw an error, else it SHOULD inspect the packet to see if the decoder thinks it's a key frame.
I'm quite sure that @marcello3d's idea works (just to wait until you have something that it's willing to decode, and then re-feed the first frame). Something I was playing with myself was to have VideoEncoder (with the same config) encode 1 KeyFrame from a green HTMLCanvas and feed this as first frame (so that the system would still work even in the rare case there is no IDR frame in the whole video file).
Didn't get this to work, since I was working on an annexB file, and the first frame is expected to contain SPS/PPS info, but would expect this to work for a AVC stream...
Re throwing, OK, I'm just nuts, ignore me :) . Misremembering the various error modes.
Using an encoder to get a known-good frame is a great idea! ... assuming your system always has all the same encoders as it has decoders. I guarantee you that in 2037 when Firefox gets WebCodecs support, it'll only have decoders for everything but VP8 ;)
Using an encoder to get a known-good frame is a great idea
I'm going to give this approach a try in the next couple of days (and should be easier to implement now that #2 is fixed), will report here!
As a practical matter to move this issue towards closing: Maybe just make a link "limitations" from the documentation for packetToEncodedVideoChunk
to this issue, since it's not something that I think can be fixed anytime soon (unless we find out how to query ffmpeg
if something is an IDR frame), at least not from this side...
Just as a point of why this is so difficult: this isn't really a disagreement between FFmpeg and WebCodecs, it's a disagreement between every file format and WebCodecs. The keyframe flag in a packet from FFmpeg/libav isn't coming from a decoder, and it doesn't even need a decoder or parser for the appropriate codec to set it. It comes from the file format. The reason recovery frames are marked as keyframes to FFmpeg is because they're marked as keyframes in .mp4 and all other file formats, and the reason they're marked as keyframes there is because they are.
I assume that the real bone of contention is that WebCodecs sees itself more as a realtime multimedia framework, and in the realtime space, you certainly wouldn't want B-frames sneaking in at the beginning of decoding. But when dealing with arbitrary files, we don't have the luxury of being so picky.
I've added the suggested warning to the API documentation.
@Yahweasel want to post that perspective on https://github.com/w3c/webcodecs/issues/650 ? The more feedback/perspectives the more likely something will happen
Something I was playing with myself was to have VideoEncoder (with the same config) encode 1 KeyFrame from a green HTMLCanvas and feed this as first frame.
So this strategy works (at least in two of my example videos that start with I frames with recovery flag; one in annexb
and one in avc
format). It should be noted that I explicitly request the software decoder, which is supposedly more forgiving on a lot of issues (I use the software decoder since my video is interlaced and Chrome doesn't have hardware support for it, and has a bug that it doesn't auto-select the software decoder for interlaced stuff). I'm quite sure the encoded frame will not be interlaced, but still, it is accepted as the first (key) frame and afterwards the stream plays just fine.
async function createFakeKeyFrameChunk(
decoderConfig: VideoDecoderConfig
): Promise<EncodedVideoChunk> {
// next 6 lines could be made in one on platforms that support Promise.withResolvers()
let resolve: (value: EncodedVideoChunk) => void
let reject: (error: any) => void
const promise = new Promise<EncodedVideoChunk>((res, rej) => {
resolve = res
reject = rej
})
const encoderConfig = {...decoderConfig} as VideoEncoderConfig
// encoderConfig needs a width and height set; in my tests these dimensions
// do not have to match the actual video dimensions, so I'm just using something
// random for them
// UPDATE: see below for new insights!!!!!
encoderConfig.width = 640
encoderConfig.height = 360
encoderConfig.avc = {format: decoderConfig.description ? "avc" : "annexb"}
const videoEncoder = new VideoEncoder({
output: (chunk, _metadata) => resolve(chunk),
error: e => reject(e)
})
try {
videoEncoder.configure(encoderConfig)
const oscanvas = new OffscreenCanvas(encoderConfig.width, encoderConfig.height)
// getting context seems to be minimal needed before it can be used as VideoFrame source
oscanvas.getContext("2d")
const videoFrame = new VideoFrame(
oscanvas, {timestamp: Number.MIN_SAFE_INTEGER})
try {
videoEncoder.encode(videoFrame)
await videoEncoder.flush()
const chunk = await promise
return chunk
} finally {
videoFrame.close()
}
} finally {
videoEncoder.close()
}
}
(In case anyone is wondering: WebCodecs software decoder is still about 10x faster in decoding the image than libav.js on my Macbook M2.)
Update: the code above says that width and height don't matter. This was true when decoding an annexB
stream, but was NOT true when decoding an avc
stream. So best to make sure that width and height match!
For the moment I'm going to close this ticket as E_NOTMYPROBLEM
. This discrepancy exists, but there's nothing that can be done about it in libavjs-webcodecs-bridge.
So this strategy works (at least in two of my example videos that start with I frames with recovery flag; one in
annexb
and one inavc
format) [...]
I have some update; not sure if there is some h264 wizard here who could help me, but at least I wanted to share it so that others running into this issue see that my solution above does not solve everything.
As a background: the video I'm working with is an MTS (mpegts) file from a JVC camcorder (annexb
). The results below are the findings I get when I remux this file to mp4 (avcc
). This is because random searching in MTS files seems to be broken in libav
(for the last 11 years). Indeed, when calling the avformat_seek_file*()
on the MTS file, the first packet is not a keyframe most of the time (and the pts found is not accurate).
A more in-depth description of the file I'm working with is in this StackOverflow answer; for here, the important thing is that there are IDR frames once every 300 frames, I frames every 12 frames. There are P frames every 3 frames, with 2 B frames in between: IBBPBBPBBPBBIBBPBB...
.
Since the stream starts with two B frames (in presentation order; unless otherwise stated, everything in here is in presentation order, first frame has framenr=0), IDR frames are found when frameNr % 300 == 2
.
In order to enable random access in the stream, I use the trick above; after flush()
ing the VideoDecoder
, I feed it one fake video frame, then do a avformat_seek_file_max()
with the pts of the frame I want[^1] (which seeks to keyframe that is before (in decoding order!!) the frame with the requested pts), and start feeding packets from there to the VideoDecoder.decode()
method. Then I feed enough packets until the VideoFrame
that I want pops out (how many packets you have to feed exactly is a bit wishy-washy, especially if you don't want to call flush()
on the decoder, but not important for this discussion).
Whether this is successful depends on what framenr % 300 you ask for (remember, there are IDR frames every 300 frames):
framenr % 300 <= 216
, so one of the first 18 I frames after the IDR frameframenr % 300 >= 228
(so not one of the first 18, out of 24 inter-IDR I-frames), VideoDecoder
will only return the I and P frames (not the B frames) -- it will continue to do so until the next IDR frame, after which B frames are returned again.If this looks familiar to anyone, I would love to hear it. In the meantime I'll look for a solution or root cause (keeping in mind that the seen behaviour may very well be an implementation artifact of the codec in Chrome).
[^1]: Actually I look for "requested pts minus 2 frames", since looking for the requested pts will not work when it's one of the B-frames directly preceding (in presentation order; directly following in decoding order) the non-IDR I-frame, since it's dependent on a P frame that was before the I-frame. This is exactly the difference between an IDR and an non-IDR I frame as described in the stackoverflow answer mentioned above.
I have no conclusion, but I would suggest that you may not be looking for an H.264 wizard. To me, this sniffs of avformat's seeking (either the general part or the MOV-specific part) being too clever by negative half.
I have no conclusion, but I would suggest that you may not be looking for an H.264 wizard. To me, this sniffs of avformat's seeking (either the general part or the MOV-specific part) being too clever by negative half.
Interesting idea. Do you mean that seeking for e.g. frame 245 does not result in starting at frame 240 (the keyframe) but something else? I guess I should be able to debug that quick enough, by printing the pts's of all packages that I receive after the seek.... Or do you mean something else?
Interesting idea. Do you mean that seeking for e.g. frame 245 does not result in starting at frame 240 (the keyframe) but something else? I guess I should be able to debug that quick enough, by printing the pts's of all packages that I receive after the seek.... Or do you mean something else?
Well, it certainly never seeks to the exact frame you ask for unless that frame happens to be a keyframe, but what I'm suggesting is that if you seek to 245, it will certainly seek to the keyframe 240, but might "intelligently" drop the B-frames that it thinks you don't need: the pts it seeked to is 240, so it'll drop anything with pts<240. Or, equivalently, it might just drop anything with a pts lower than where it ended up seeking to, which may be different from where you asked it to seek to. The fact that the frames are vanishing to me suggests the possibility of avformat shenanigans rather than avcodec or WebCodecs shenanigans. But, just guessing and grasping at straws here :)
In
packetToEncodedVideoChunk
the type is set to "key" ifpacket.flags & AV_PKT_FLAG_KEY
:There is however an annoying thing that
ffmpeg
sets this flag for any I-frame with a recovery message, whereas Webcodecs demands an IDR frame to start decoding from (at least in h264). I previously made a request to allow decoding to start at a recovery message, but this has not been implemented (yet). Current thinking (from the linked issue) seems to be to add an extra type (recover
rather thankey
) for these frames.Now I don't have a good solution how to fix this for now (since I'm not sure ffmpeg internally keeps track of the difference between I and IDR frames), but I just wanted to make sure this issue is logged somewhere.