MonoS / SupMover

SupMover - Shift timings and Screen Area of PGS/Sup subtitle
GNU Affero General Public License v3.0
38 stars 4 forks source link

`cut_merge` broken #16

Open YourMJK opened 5 months ago

YourMJK commented 5 months ago

You mentioned that the cut_merge mode is broken and shouldn't be used. What is broken? I want to use it.

MonoS commented 5 months ago

I had the wrong assumption that every subtitle would end with an empty sequence of PCS/WDS/END, however in case a subtitle is right after the previous one, temporally speaking, then no empty sequence is needed and the previous pic will end at the beginning of the subsequent.

Right now in the code you'll see variable like foundBegin and foundEnd, but those are wrong as based on this wrong assumption, I never took some time to think of a proper solution as I have other way to achieve the same result, namely this one which I generate using another application I've developed in private which take a sequence of frame number and generate the cut parameter for different kinds of track types (sup, ass, dts, ac3, flac)

YourMJK commented 5 months ago

Okay, thanks for the information. I might look into it and create a PR if I found a solution.

MonoS commented 5 months ago

Selur from Doom9 sent me the source code of SeCut, you can take a look at it if you want SECut_source.zip

MonoS commented 4 months ago

Using your --trace i think i'm starting to understand better how SUP file works, you can find attached the subtitle that made me realize this functionality was broken test cut_merge.zip using this cmd cut_merge format secut timemode timestamp fixmode cut list 0:0:0.000-0:0:20.000 in this case you would expect this functionality to preserve only the first two subpic image instead it save the first three with the wrong endtime image

if we trace the original file we can see this

+ DS
  + PTS: 0:00:17.351
  + PCS Segment: offset 0
    + Video size: 1920x1080
    + Composition number: 0
    + Composition state: Epoch Start
    + Composition object
      + Object ID: 0
      + Window ID: 0
      + Position: 535,944
  + WDS Segment: offset 0x20
    + Window
      + Window ID: 0
      + Window frame: 535,944,854,69
  + PDS Segment: offset 0x37
  + ODS Segment: offset 0x181
  + END Segment: offset 0x52d0
+ DS
  + PTS: 0:00:19.144
  + PCS Segment: offset 0x52dd
    + Video size: 1920x1080
    + Composition number: 1
    + Composition state: Normal
  + WDS Segment: offset 0x52f5
    + Window
      + Window ID: 0
      + Window frame: 535,944,854,69
  + END Segment: offset 0x530c
+ DS
  + PTS: 0:00:19.228
  + PCS Segment: offset 0x5319
    + Video size: 1920x1080
    + Composition number: 2
    + Composition state: Epoch Start
    + Composition object
      + Object ID: 0
      + Window ID: 0
      + Position: 800,944
  + WDS Segment: offset 0x5339
    + Window
      + Window ID: 0
      + Window frame: 516,867,886,146
    + Window
      + Window ID: 1
      + Window frame: 426,67,1068,133
  + PDS Segment: offset 0x5359
  + ODS Segment: offset 0x54a3
  + END Segment: offset 0x749f
+ DS
  + PTS: 0:00:20.521
  + PCS Segment: offset 0x74ac
    + Video size: 1920x1080
    + Composition number: 3
    + Composition state: Aquisition Point
    + Composition object
      + Object ID: 1
      + Window ID: 1
      + Position: 724,147
    + Composition object
      + Object ID: 2
      + Window ID: 1
      + Position: 426,67
  + WDS Segment: offset 0x74d4
    + Window
      + Window ID: 0
      + Window frame: 516,867,886,146
    + Window
      + Window ID: 1
      + Window frame: 426,67,1068,133
  + PDS Segment: offset 0x74f4
  + ODS Segment: offset 0x763e
  + ODS Segment: offset 0xa1de
  + END Segment: offset 0x11068
+ DS
  + PTS: 0:00:20.646
  + PCS Segment: offset 0x11075
    + Video size: 1920x1080
    + Composition number: 4
    + Composition state: Aquisition Point
    + Composition object
      + Object ID: 3
      + Window ID: 0
      + Position: 516,867
    + Composition object
      + Object ID: 4
      + Window ID: 1
      + Position: 426,67
  + WDS Segment: offset 0x1109d
    + Window
      + Window ID: 0
      + Window frame: 516,867,886,146
    + Window
      + Window ID: 1
      + Window frame: 426,67,1068,133
  + PDS Segment: offset 0x110bd
  + ODS Segment: offset 0x11207
  + ODS Segment: offset 0x18169
  + END Segment: offset 0x21c94
+ DS
  + PTS: 0:00:24.525
  + PCS Segment: offset 0x21ca1
    + Video size: 1920x1080
    + Composition number: 5
    + Composition state: Normal
  + WDS Segment: offset 0x21cb9
    + Window
      + Window ID: 0
      + Window frame: 516,867,886,146
    + Window
      + Window ID: 1
      + Window frame: 426,67,1068,133
  + END Segment: offset 0x21cd9

The first subpic is by itself with a sequence of Epoch start -> Normal composition state, but afterwards we see that we have a sequence of Epoch start -> Aquisition Point -> Aquisition Point -> Normal because there are three subpic one after the others, my initial implementation assume always the first sequence, as a matter of fact if we trace the output we see this

+ DS
  + PTS: 0:00:17.351
  + PCS Segment: offset 0
    + Video size: 1920x1080
    + Composition number: 0
    + Composition state: Epoch Start
    + Composition object
      + Object ID: 0
      + Window ID: 0
      + Position: 535,944
  + WDS Segment: offset 0x20
    + Window
      + Window ID: 0
      + Window frame: 535,944,854,69
  + PDS Segment: offset 0x37
  + ODS Segment: offset 0x181
  + END Segment: offset 0x52d0
+ DS
  + PTS: 0:00:19.144
  + PCS Segment: offset 0x52dd
    + Video size: 1920x1080
    + Composition number: 0
    + Composition state: Normal
  + WDS Segment: offset 0x52f5
    + Window
      + Window ID: 0
      + Window frame: 535,944,854,69
  + END Segment: offset 0x530c
+ DS
  + PTS: 0:00:19.228
  + PCS Segment: offset 0x5319
    + Video size: 1920x1080
    + Composition number: 1
    + Composition state: Epoch Start
    + Composition object
      + Object ID: 0
      + Window ID: 0
      + Position: 800,944
  + WDS Segment: offset 0x5339
    + Window
      + Window ID: 0
      + Window frame: 516,867,886,146
    + Window
      + Window ID: 1
      + Window frame: 426,67,1068,133
  + PDS Segment: offset 0x5359
  + ODS Segment: offset 0x54a3
  + END Segment: offset 0x749f
+ DS
  + PTS: 0:00:20.000
  + PCS Segment: offset 0x74ac
    + Video size: 1920x1080
    + Composition number: 1
    + Composition state: Aquisition Point
    + Composition object
      + Object ID: 1
      + Window ID: 1
      + Position: 724,147
    + Composition object
      + Object ID: 2
      + Window ID: 1
      + Position: 426,67
  + WDS Segment: offset 0x74d4
    + Window
      + Window ID: 0
      + Window frame: 516,867,886,146
    + Window
      + Window ID: 1
      + Window frame: 426,67,1068,133
  + PDS Segment: offset 0x74f4
  + ODS Segment: offset 0x763e
  + ODS Segment: offset 0xa1de
  + END Segment: offset 0x11068

When we have an additional Acquisition Point segment and we are missing the Normal segment as it assumed the former was the Normal segment (in the code i mark this segment with the cutMerge_foundEnd variable )

Hope this will be of help :)

cubicibo commented 4 months ago

If you slice & merge the stream you must ensure that:

Whenever you cut in the middle of an epoch, you need to track object IDs and object redefinition: [DS1(ES, o0_0), DS2(AP, o0_1), DS3(NC, palette update), DS4(NC, no composition)] If you drop DS2, DS3 becomes invalid, as it operates on version 1 of object 0. Furthermore, object 0 version 0 isn't meant to be undisplayed at DS4, but at the time of DS2.

Another painful case: [DS1(ES, o0_0), DS2(AP, o0_1, o1), DS3(NC, o1), DS4(NC, no composition)] (DS3 redraws the screen, keeping only object1) If you drop DS2 alone, DS3 will reference an object that was never defined.

The sanest approach is to drop every Normal Case (NC) that follows a dropped AP or ES, up to the next AP or ES. Then, you must append new NCs that undisplay a kept composition when:

And that will cover most existing streams.

YourMJK commented 4 months ago

@cubicibo Thanks for that detailed rundown! Helped me better understand how this format is supposed to work …

The sanest approach is to drop every Normal Case (NC) that follows a dropped AP or ES, up to the next AP or ES.

Is that really necessary? My idea would be the following:

  1. Drop every DS with timestamp between in- and out-point of section to remove.
  2. If in-point inside an epoch: Insert new NC at in-point to undisplay the preceding composition.
  3. If out-point inside an epoch: Insert new ES at out-point with contents obtained by the following procedure: 3.1. Find ES before out-point. 3.2. Copy all ODS and WDS segments, composition objects and windows from that ES and all following APs up to out-point (replacing objects with same ID with their latest version). Essentially merging ES and APs into a single big new ES. 3.3. "Replay" composition updates: remove all composition objects which were undisplayed by a NC (and not redisplayed by a later NC) since that ES and use latest positions.
  4. Renumber compositions to ensure they're strictly increasing.
  5. If we also want to merge (I will make this optional): set timestamp of new ES (from 3.) to same as new NC (from 2.) and shift all subsequent timestamps.

Now we don't have to worry about following NCs referencing unknown objects since we copied all the ODS and WDS they could reference. Does this logic work out? Or am I missing/simplifying something?

For 2. and 3.: A timestamp is not inside an epoch if the DS immediately before is a NC and immediately after is an ES. And of course, if the ES we find in 3.1. is not inside the section we want to remove, this method may unnecessarily duplicate some ODS, which is wasted disk space … but so what.

What I don't fully understand is this:

cubicibo commented 4 months ago

Is that really necessary?

Depends of the desired cut-merge code complexity ;)

  1. [...]

This is too conservative. Just find the last dropped AP (or ES else) and accumulate data from there. Or, continuously accumulate data from ES and reset every time you find an AP or ES. Both AP and ES states mean that any preceding data shall not be used. Windows, Palettes and Objects are everything you need to track. You may additionally store the latest composition list, but you do not need to track composition objects per se. Point 3.3 is unnecessary. Just use the last list of compositions.

What to do with palettes and palette updates?

Palettes need to be tracked in an additive fashion. Each PDS content is ORed to the specified palette (palette ID in PDS). Your new ES DS must have a PDS that include all palette entries of palette ID defined so far in the decoder. Furthermore, any NC following your new ES may access one of the 8 palette. For every first access to a given palette, you will need to update the NC with a PDS that provides all defined palette entries you have collected in the dropped segments.

Once you have your new ES DS inserted and you take proper care of the PDS data, palette updates will work naturally.

How do version numbers for ODS and PDS work? If you replace an existing ID within an epoch it has to increase?

All version number starts from zero within an epoch, any redefinition (ODS) or change (PDS) should increase the version number.

Can there be multiple compositions (with different composition numbers) within an epoch?

Yes, there can be multiple compositions. Furthermore, the composition number is strictly increasing in the data stream, regardless of the composition state. It only goes back to zero when the field overflows after 0xFFFF.

  1. [...]

No idea what you mean by merge. But that seems correct.

And of course, if the ES we find in 3.1. is not inside the section we want to remove, this method may unnecessarily duplicate some ODS, which is wasted disk space … but so what

Disc space is not the issue, decoding time is. PGS bitmaps decoders are slow, do not throw objects at them if you don't need them! And, that is the last and hardest point of that cut process: compute correctly DTS of the new DisplaySets and verify all decoding constraints.

Now, let's be honest, you DON'T want to go in this rabbit hole. These conditions are sufficient:

If you indeed want to go in the rabbit hole and inflict yourself endless pain, here are the key constraints in 90 kHz ticks:

  1. $DTS(DS{n}(END)) \le DTS(DS{n+1}(PCS))$
  2. $PTS(DS{n}(PCS)) + access\ time \le PTS(DS{n+1}(PCS))$
  3. Only if $DS{n+1}$ is ES: $PTS(DS{n}(PCS)) \le DTS(DS_{n+1}(PCS))$
  4. $DTS(DS{n}(PCS)) = PTS(DS{n}(PCS)) - decode\ duration$
  5. DTS values are monotonic regardless of the segment type. PTS are only monotonic to a given segment type.
  6. $DTS(PCS) = DTS(WDS) = DTS(PDS) = DTS(ODS{0}) \lt DTS(ODS{j}) \lt PTS(OTDS_{j}) = DTS(END)$

$access\ time$ depends on the palette update flag in $DS_{n+1}$:

Two DS with the same PTS timestamp?

Not permitted. See constraint 2.

cubicibo commented 4 months ago

Here's an exotic sample for your tests. It contains palette updates, long epochs, assigns different palettes and object IDs, overlapping display sets in decoder, as well as NC with ODS. nc2.zip

The datastream complies with the PGS decoder model for a 59.94p video (possible on UHD BD).

YourMJK commented 4 months ago

Thanks again for that valuable information. And the sample file!

Both AP and ES states mean that any preceding data shall not be used.

Ah, that clears things up!

No idea what you mean by merge.

The current cut&merge mode cuts out a section and then also closes that gap ("merge") by subtracting the duration of the section from subsequent PTS. I want to make this optional with my new implementation but since you mentioned that

Two DS with the same PTS timestamp?

Not permitted. See constraint 2.

my method (5.) for undisplaying a previous composition at the same time as starting a new epoch seems not valid and I will have to find something different. I guess either leave a 1/90sec ∆t in-between or do it correctly (?) with an AP.

Now, let's be honest, you DON'T want to go in this rabbit hole.

I ignored DTS so far because of this comment at thescorpius' documentation:

DTS is always zero in practice (at least from what I have found so far), so you can freely ignore this value.

I guess that make sense, if you set them all to zero, you just tell the player to decode them all ASAP and keep the whole stream in memory. Not efficient but that would let me not have to care about that problem … However, seem like that's not actually valid as per your constraint 5.

YourMJK commented 4 months ago

@MonoS In addition to re-implementing this cut&merge mode more rigorously, I want to make it a bit more flexible.

Currently, you can only specify sections to remove (blacklist) and the gaps are always closed. For my use cases, I wish I could just delete some subtitles by either specifying what to keep (whitelist) or what to drop (blacklist) and leave the PTS of the rest alone (i.e. keep same duration).

So I'm thinking about making that behaviour more customizable, something like --include <list of sections> and --exclude <list of sections> (probably only one at a time allowed) and --gapmode (keep | close) (there are probably more intuitive wordings)

Of course, I could also find a solution that's backwards compatible with the current syntax.

What do you think?

Maybe I'll have some time to take a crack at this this week. If not, then in two weeks.

cubicibo commented 4 months ago

DTS = 0 is seen in the wild occasionally. Basic streams where a single subtitle is displayed and undisplayed every other second do not need to have a DTS. Complex streams need it as we must ensure that every packet enters the decoder at the right time. Without DTS, packets could be stored way before they are needed, and this would overflow the decoder, or corrupt screen updates. Consistency is key: if SupMover cuts a stream where DTS is set, your new DS must have a proper DTS too to pass the fifth constraint. Similarly, a stream where DTS is zero shouldn't suddently have a DTS != 0.

TheScorpius doc is excellent, but there are a few errors. E.g Palette ID in PCS: the field specifies the palette to use in the display process, regardless of the palette update flag.

my method (5.) for undisplaying a previous composition at the same time as starting a new epoch seems not valid and I will have to find something different. I guess either leave a 1/90sec ∆t in-between or do it correctly (?) with an AP.

No need. The Epoch Start process itself will remove any graphic visible on the screen, that undisplay DS is redundant.

YourMJK commented 4 months ago

Consistency is key

Agreed, that's what I was thinking as well: look at the DTS around the new DS and adapt accordingly.

E.g Palette ID in PCS: the field specifies the palette to use in the display process, regardless of the palette update flag.

Good to know! I will fix my code for the --trace option then ;) Anymore known errors?

The Epoch Start process itself will remove any graphic visible on the screen, that undisplay DS is redundant.

Right 🤦

cubicibo commented 4 months ago

Anymore known errors?

As to some omission:

YourMJK commented 4 months ago
  • Object Data Length is RLE length + 4, as it includes width and height fields.

Coincidentally, I also figured that out myself yesterday when testing my ODS reading implementation

  • There's confusion with the cropping and forced flag. Cropping is 0x80, forced 0x40.

Thanks, another thing to fix!

YourMJK commented 4 months ago

@cubicibo I found another interesting diversion, according to this source file from BDSup2Sub, there is also a fourth composition state: "Epoch Continue" (0xC0). Basically 0x80 | 0x40, so both "Epoch Start" and "Acquisition Point" …?

Do you know if this is really a thing? If yes, how does one have to handle it and how is it different to "Acquisition Point"?

cubicibo commented 4 months ago

It is only found in seamless branching streams with subtitles active during the branch. When both epoch start and acquisition point flags are set, the decoder additionally check for composition number equality with the last decoded display set. If equal, the decoder assumes the presentation is continuous and will not decode the DisplaySet. If not, the decoder performs an epoch start procedure.

Patent US7660516B2 describes the process fairly well. In any case, this is something the authoring software should handles, as it it tied to m2ts muxing.

Frankly, I have never seen a sample in the wild, demuxer are supposed to discard that display set, and I don't even know how that would work on complex streams akin to the sample posted above.