Open YourMJK opened 5 months ago
I had the wrong assumption that every subtitle would end with an empty sequence of PCS/WDS/END, however in case a subtitle is right after the previous one, temporally speaking, then no empty sequence is needed and the previous pic will end at the beginning of the subsequent.
Right now in the code you'll see variable like foundBegin and foundEnd, but those are wrong as based on this wrong assumption, I never took some time to think of a proper solution as I have other way to achieve the same result, namely this one which I generate using another application I've developed in private which take a sequence of frame number and generate the cut parameter for different kinds of track types (sup, ass, dts, ac3, flac)
Okay, thanks for the information. I might look into it and create a PR if I found a solution.
Selur from Doom9 sent me the source code of SeCut, you can take a look at it if you want SECut_source.zip
Using your --trace
i think i'm starting to understand better how SUP file works, you can find attached the subtitle that made me realize this functionality was broken
test cut_merge.zip
using this cmd
cut_merge format secut timemode timestamp fixmode cut list 0:0:0.000-0:0:20.000
in this case you would expect this functionality to preserve only the first two subpic
instead it save the first three with the wrong endtime
if we trace the original file we can see this
+ DS
+ PTS: 0:00:17.351
+ PCS Segment: offset 0
+ Video size: 1920x1080
+ Composition number: 0
+ Composition state: Epoch Start
+ Composition object
+ Object ID: 0
+ Window ID: 0
+ Position: 535,944
+ WDS Segment: offset 0x20
+ Window
+ Window ID: 0
+ Window frame: 535,944,854,69
+ PDS Segment: offset 0x37
+ ODS Segment: offset 0x181
+ END Segment: offset 0x52d0
+ DS
+ PTS: 0:00:19.144
+ PCS Segment: offset 0x52dd
+ Video size: 1920x1080
+ Composition number: 1
+ Composition state: Normal
+ WDS Segment: offset 0x52f5
+ Window
+ Window ID: 0
+ Window frame: 535,944,854,69
+ END Segment: offset 0x530c
+ DS
+ PTS: 0:00:19.228
+ PCS Segment: offset 0x5319
+ Video size: 1920x1080
+ Composition number: 2
+ Composition state: Epoch Start
+ Composition object
+ Object ID: 0
+ Window ID: 0
+ Position: 800,944
+ WDS Segment: offset 0x5339
+ Window
+ Window ID: 0
+ Window frame: 516,867,886,146
+ Window
+ Window ID: 1
+ Window frame: 426,67,1068,133
+ PDS Segment: offset 0x5359
+ ODS Segment: offset 0x54a3
+ END Segment: offset 0x749f
+ DS
+ PTS: 0:00:20.521
+ PCS Segment: offset 0x74ac
+ Video size: 1920x1080
+ Composition number: 3
+ Composition state: Aquisition Point
+ Composition object
+ Object ID: 1
+ Window ID: 1
+ Position: 724,147
+ Composition object
+ Object ID: 2
+ Window ID: 1
+ Position: 426,67
+ WDS Segment: offset 0x74d4
+ Window
+ Window ID: 0
+ Window frame: 516,867,886,146
+ Window
+ Window ID: 1
+ Window frame: 426,67,1068,133
+ PDS Segment: offset 0x74f4
+ ODS Segment: offset 0x763e
+ ODS Segment: offset 0xa1de
+ END Segment: offset 0x11068
+ DS
+ PTS: 0:00:20.646
+ PCS Segment: offset 0x11075
+ Video size: 1920x1080
+ Composition number: 4
+ Composition state: Aquisition Point
+ Composition object
+ Object ID: 3
+ Window ID: 0
+ Position: 516,867
+ Composition object
+ Object ID: 4
+ Window ID: 1
+ Position: 426,67
+ WDS Segment: offset 0x1109d
+ Window
+ Window ID: 0
+ Window frame: 516,867,886,146
+ Window
+ Window ID: 1
+ Window frame: 426,67,1068,133
+ PDS Segment: offset 0x110bd
+ ODS Segment: offset 0x11207
+ ODS Segment: offset 0x18169
+ END Segment: offset 0x21c94
+ DS
+ PTS: 0:00:24.525
+ PCS Segment: offset 0x21ca1
+ Video size: 1920x1080
+ Composition number: 5
+ Composition state: Normal
+ WDS Segment: offset 0x21cb9
+ Window
+ Window ID: 0
+ Window frame: 516,867,886,146
+ Window
+ Window ID: 1
+ Window frame: 426,67,1068,133
+ END Segment: offset 0x21cd9
The first subpic is by itself with a sequence of Epoch start
-> Normal
composition state, but afterwards we see that we have a sequence of Epoch start
-> Aquisition Point
-> Aquisition Point
-> Normal
because there are three subpic one after the others, my initial implementation assume always the first sequence, as a matter of fact if we trace the output we see this
+ DS
+ PTS: 0:00:17.351
+ PCS Segment: offset 0
+ Video size: 1920x1080
+ Composition number: 0
+ Composition state: Epoch Start
+ Composition object
+ Object ID: 0
+ Window ID: 0
+ Position: 535,944
+ WDS Segment: offset 0x20
+ Window
+ Window ID: 0
+ Window frame: 535,944,854,69
+ PDS Segment: offset 0x37
+ ODS Segment: offset 0x181
+ END Segment: offset 0x52d0
+ DS
+ PTS: 0:00:19.144
+ PCS Segment: offset 0x52dd
+ Video size: 1920x1080
+ Composition number: 0
+ Composition state: Normal
+ WDS Segment: offset 0x52f5
+ Window
+ Window ID: 0
+ Window frame: 535,944,854,69
+ END Segment: offset 0x530c
+ DS
+ PTS: 0:00:19.228
+ PCS Segment: offset 0x5319
+ Video size: 1920x1080
+ Composition number: 1
+ Composition state: Epoch Start
+ Composition object
+ Object ID: 0
+ Window ID: 0
+ Position: 800,944
+ WDS Segment: offset 0x5339
+ Window
+ Window ID: 0
+ Window frame: 516,867,886,146
+ Window
+ Window ID: 1
+ Window frame: 426,67,1068,133
+ PDS Segment: offset 0x5359
+ ODS Segment: offset 0x54a3
+ END Segment: offset 0x749f
+ DS
+ PTS: 0:00:20.000
+ PCS Segment: offset 0x74ac
+ Video size: 1920x1080
+ Composition number: 1
+ Composition state: Aquisition Point
+ Composition object
+ Object ID: 1
+ Window ID: 1
+ Position: 724,147
+ Composition object
+ Object ID: 2
+ Window ID: 1
+ Position: 426,67
+ WDS Segment: offset 0x74d4
+ Window
+ Window ID: 0
+ Window frame: 516,867,886,146
+ Window
+ Window ID: 1
+ Window frame: 426,67,1068,133
+ PDS Segment: offset 0x74f4
+ ODS Segment: offset 0x763e
+ ODS Segment: offset 0xa1de
+ END Segment: offset 0x11068
When we have an additional Acquisition Point
segment and we are missing the Normal
segment as it assumed the former was the Normal
segment (in the code i mark this segment with the cutMerge_foundEnd
variable )
Hope this will be of help :)
If you slice & merge the stream you must ensure that:
Acquisition Point
(AP) or Epoch Start
(ES).Whenever you cut in the middle of an epoch, you need to track object IDs and object redefinition:
[DS1(ES, o0_0), DS2(AP, o0_1), DS3(NC, palette update), DS4(NC, no composition)]
If you drop DS2, DS3 becomes invalid, as it operates on version 1 of object 0. Furthermore, object 0 version 0 isn't meant to be undisplayed at DS4, but at the time of DS2.
Another painful case:
[DS1(ES, o0_0), DS2(AP, o0_1, o1), DS3(NC, o1), DS4(NC, no composition)]
(DS3 redraws the screen, keeping only object1)
If you drop DS2 alone, DS3 will reference an object that was never defined.
The sanest approach is to drop every Normal Case
(NC) that follows a dropped AP or ES, up to the next AP or ES. Then, you must append new NCs that undisplay a kept composition when:
And that will cover most existing streams.
@cubicibo Thanks for that detailed rundown! Helped me better understand how this format is supposed to work …
The sanest approach is to drop every Normal Case (NC) that follows a dropped AP or ES, up to the next AP or ES.
Is that really necessary? My idea would be the following:
Now we don't have to worry about following NCs referencing unknown objects since we copied all the ODS and WDS they could reference. Does this logic work out? Or am I missing/simplifying something?
For 2. and 3.: A timestamp is not inside an epoch if the DS immediately before is a NC and immediately after is an ES. And of course, if the ES we find in 3.1. is not inside the section we want to remove, this method may unnecessarily duplicate some ODS, which is wasted disk space … but so what.
What I don't fully understand is this:
Is that really necessary?
Depends of the desired cut-merge code complexity ;)
- [...]
This is too conservative. Just find the last dropped AP (or ES else) and accumulate data from there. Or, continuously accumulate data from ES and reset every time you find an AP or ES. Both AP and ES states mean that any preceding data shall not be used. Windows, Palettes and Objects are everything you need to track. You may additionally store the latest composition list, but you do not need to track composition objects per se. Point 3.3 is unnecessary. Just use the last list of compositions.
What to do with palettes and palette updates?
Palettes need to be tracked in an additive fashion. Each PDS content is ORed to the specified palette (palette ID in PDS). Your new ES DS must have a PDS that include all palette entries of palette ID defined so far in the decoder. Furthermore, any NC following your new ES may access one of the 8 palette. For every first access to a given palette, you will need to update the NC with a PDS that provides all defined palette entries you have collected in the dropped segments.
Once you have your new ES DS inserted and you take proper care of the PDS data, palette updates will work naturally.
How do version numbers for ODS and PDS work? If you replace an existing ID within an epoch it has to increase?
All version number starts from zero within an epoch, any redefinition (ODS) or change (PDS) should increase the version number.
Can there be multiple compositions (with different composition numbers) within an epoch?
Yes, there can be multiple compositions. Furthermore, the composition number is strictly increasing in the data stream, regardless of the composition state. It only goes back to zero when the field overflows after 0xFFFF.
- [...]
No idea what you mean by merge. But that seems correct.
And of course, if the ES we find in 3.1. is not inside the section we want to remove, this method may unnecessarily duplicate some ODS, which is wasted disk space … but so what
Disc space is not the issue, decoding time is. PGS bitmaps decoders are slow, do not throw objects at them if you don't need them! And, that is the last and hardest point of that cut process: compute correctly DTS of the new DisplaySets and verify all decoding constraints.
Now, let's be honest, you DON'T want to go in this rabbit hole. These conditions are sufficient:
If you indeed want to go in the rabbit hole and inflict yourself endless pain, here are the key constraints in 90 kHz ticks:
$access\ time$ depends on the palette update flag in $DS_{n+1}$:
Two DS with the same PTS timestamp?
Not permitted. See constraint 2.
Here's an exotic sample for your tests. It contains palette updates, long epochs, assigns different palettes and object IDs, overlapping display sets in decoder, as well as NC with ODS. nc2.zip
The datastream complies with the PGS decoder model for a 59.94p video (possible on UHD BD).
Thanks again for that valuable information. And the sample file!
Both AP and ES states mean that any preceding data shall not be used.
Ah, that clears things up!
No idea what you mean by merge.
The current cut&merge mode cuts out a section and then also closes that gap ("merge") by subtracting the duration of the section from subsequent PTS. I want to make this optional with my new implementation but since you mentioned that
Two DS with the same PTS timestamp?
Not permitted. See constraint 2.
my method (5.) for undisplaying a previous composition at the same time as starting a new epoch seems not valid and I will have to find something different. I guess either leave a 1/90sec ∆t in-between or do it correctly (?) with an AP.
Now, let's be honest, you DON'T want to go in this rabbit hole.
I ignored DTS so far because of this comment at thescorpius' documentation:
DTS is always zero in practice (at least from what I have found so far), so you can freely ignore this value.
I guess that make sense, if you set them all to zero, you just tell the player to decode them all ASAP and keep the whole stream in memory. Not efficient but that would let me not have to care about that problem … However, seem like that's not actually valid as per your constraint 5.
@MonoS In addition to re-implementing this cut&merge mode more rigorously, I want to make it a bit more flexible.
Currently, you can only specify sections to remove (blacklist) and the gaps are always closed. For my use cases, I wish I could just delete some subtitles by either specifying what to keep (whitelist) or what to drop (blacklist) and leave the PTS of the rest alone (i.e. keep same duration).
So I'm thinking about making that behaviour more customizable, something like
--include <list of sections>
and
--exclude <list of sections>
(probably only one at a time allowed)
and
--gapmode (keep | close)
(there are probably more intuitive wordings)
Of course, I could also find a solution that's backwards compatible with the current syntax.
What do you think?
Maybe I'll have some time to take a crack at this this week. If not, then in two weeks.
DTS = 0 is seen in the wild occasionally. Basic streams where a single subtitle is displayed and undisplayed every other second do not need to have a DTS. Complex streams need it as we must ensure that every packet enters the decoder at the right time. Without DTS, packets could be stored way before they are needed, and this would overflow the decoder, or corrupt screen updates. Consistency is key: if SupMover cuts a stream where DTS is set, your new DS must have a proper DTS too to pass the fifth constraint. Similarly, a stream where DTS is zero shouldn't suddently have a DTS != 0.
TheScorpius doc is excellent, but there are a few errors. E.g Palette ID
in PCS: the field specifies the palette to use in the display process, regardless of the palette update flag.
my method (5.) for undisplaying a previous composition at the same time as starting a new epoch seems not valid and I will have to find something different. I guess either leave a 1/90sec ∆t in-between or do it correctly (?) with an AP.
No need. The Epoch Start process itself will remove any graphic visible on the screen, that undisplay DS is redundant.
Consistency is key
Agreed, that's what I was thinking as well: look at the DTS around the new DS and adapt accordingly.
E.g Palette ID in PCS: the field specifies the palette to use in the display process, regardless of the palette update flag.
Good to know! I will fix my code for the --trace
option then ;)
Anymore known errors?
The Epoch Start process itself will remove any graphic visible on the screen, that undisplay DS is redundant.
Right 🤦
Anymore known errors?
As to some omission:
- Object Data Length is RLE length + 4, as it includes width and height fields.
Coincidentally, I also figured that out myself yesterday when testing my ODS reading implementation
- There's confusion with the cropping and forced flag. Cropping is 0x80, forced 0x40.
Thanks, another thing to fix!
@cubicibo I found another interesting diversion, according to this source file from BDSup2Sub, there is also a fourth composition state: "Epoch Continue" (0xC0
).
Basically 0x80 | 0x40
, so both "Epoch Start" and "Acquisition Point" …?
Do you know if this is really a thing? If yes, how does one have to handle it and how is it different to "Acquisition Point"?
It is only found in seamless branching streams with subtitles active during the branch. When both epoch start and acquisition point flags are set, the decoder additionally check for composition number equality with the last decoded display set. If equal, the decoder assumes the presentation is continuous and will not decode the DisplaySet. If not, the decoder performs an epoch start procedure.
Patent US7660516B2 describes the process fairly well. In any case, this is something the authoring software should handles, as it it tied to m2ts muxing.
Frankly, I have never seen a sample in the wild, demuxer are supposed to discard that display set, and I don't even know how that would work on complex streams akin to the sample posted above.
You mentioned that the
cut_merge
mode is broken and shouldn't be used. What is broken? I want to use it.