Matroska-Org / libebml

a C++ library to parse EBML files
GNU Lesser General Public License v2.1
139 stars 47 forks source link

EbmlElement::FindNextID doesn't support global elements #139

Open mbunkus opened 1 year ago

mbunkus commented 1 year ago

The programs in the MKVToolNix package use the FindNextId function for finding the first elements: the EBML Head & the Matroska Segment.

However, there may be other elements present: EBML Void & EBML CRC32 [^1]. EBML Void is something that mkvpropedit used to create as a top-level element in certain situations. Unfortunately, there are quite a lot of programs that simply don't support EBML Void elements in the top-level position in the EBML Body, including libEBML & ffmpeg.

In the case of libEBML the following happens:

Note that the related function FindNextElement does support global elements. Unfortunately I cannot use FindNextElement for the KaxSegment as FindNextElement attempts to read the whole element into memory, which is obviously not going to work with big Matroska files.

Here is a test file created with a slightly modified mkvmerge: there's a small EBML Void element between the EBML Head & the Matroska Segment. The current mkvinfo output shows how this fails:

[0 mosu@sweet-chili ~/prog/video/data] mkvinfo v.mkv
+ EBML head
|+ EBML version: 1
|+ EBML read version: 1
|+ Maximum EBML ID length: 4
|+ Maximum EBML size length: 8
|+ Document type: matroska
|+ Document type version: 4
|+ Document type read version: 1
+ (Known element, but invalid at this position: EBML void; ID: 0xec size: 6)
+ Segment: size 5760
|+ Seek head (subentries will be skipped)
…

Currently mkvmerge doesn't support such files, but I'll fix that soonish. ffmpeg also fails, but that isn't our problem, of course.

[^1]: The EBML RFC says: "EBML allows some special Elements to be found within more than one parent in an EBML Document or optionally at the Root Level of an EBML Body." (emphasis mine)

mbunkus commented 1 year ago

Please note that the output (Known element, but invalid at this position… is only generated for instances of EbmlDummy. mkvinfo will itself try to map the ID contained in the EbmlDummy element to known element IDs & their names.