Matroska-Org / libmatroska

a C++ libary to parse Matroska files (.mkv and .mka)
GNU Lesser General Public License v2.1
319 stars 57 forks source link

mkclean porting issues #189

Open robUx4 opened 7 months ago

robUx4 commented 7 months ago

This thread is to discuss issues porting mkclean from libebml2/libmatroska2 (Core-C versions) to libebml/libmatroska 2.0.

The Core-C version was originally a port of the C++ code with some extra work on strengthening the coherence of data for mkvalidator and mkclean. So the internal design and philosphy is very similar. However there are a few extra things that are tricky to do with our C++ versions, at least in there current form.

Block timestamps

A Block timestamp depends on the Cluster timestamp, the Segment timestamp scale (and the track timestamp scale). So when you read or write the values you need to have all elements around. In libmatroska2 this is handled internally. When a Block is created it can be attached the Segment Info and Track Entry it belongs to. It's also attached its parent element (which itself has a parent element, etc).

Element reuse for writing

A Block read by mkclean is also used to write on the output. It has a different Segment Info/Track Entry for reading and writing in case they are modified (different timestamp scale for example).

Getter/Setter of keyframe flags

The SimpleBlock vs Block is partly hidden from the user. It's possible to read/write the keyframe info on either of them. While it's straightforward for a SimpleBlock, for a Block that means manipulating it parent BlockGroup (for example adding a dummy ReferenceBlock).

Frame duration

Each frame in a Block has a possibly known duration. This allows splitting blocks and not losing the duration when its known. In some cases it involves parsing the frame headers of some known codecs.

Cues linked to the Segment/Block

This is to get the position in the file just by using the Cues elements.


Some of these could be done by adding a weak reference to the parent in each element (except for the EbmlHead and Segment elements which have no parent).

Some others could be done by using a "shadow" version of some elements, Block in particular, so that when they are read a shadow version with more extra data can be created and the extra data set. Or we could do like libmatroska2 works: for example the Block objects have a few (hardcoded) extra fields that can be read/write.

The "shadow" versions might be done externally to libmatroska, but for that would should be able to add hooks when an EbmlElement is created to we can create a different/extended one. This is currently not possible with the hardcoded tables of elements and Create callbacks. Also it means a lot of casting between the base and the extended versions of an element.

Adding this hook to libebml could certainly be useful. However that means all the extra work to add stronger ties between elements would only be found in mkclean and not available to all libmatroska users. As we're doing the 2.0 versions this could be a rare opportunity to add this possibility, which should probably be optional, not to break existing code.