EricBerendsen / dvbinspector

DVB Inspector is an open-source DVB analyzer, written in java
http://www.digitalekabeltelevisie.nl/dvb_inspector/
GNU General Public License v3.0
174 stars 30 forks source link

SCTE-35 messages inside the DSM-CC stream (was Buffer overflow in TransportStream's packet_pid) #60

Open teaalltr opened 1 year ago

teaalltr commented 1 year ago

Hi, just got this exception:

Warning: error parsing transport stream
java.lang.ArrayIndexOutOfBoundsException: Index 33345520 out of bounds for length 33345520
    at nl.digitalekabeltelevisie.data.mpeg.TransportStream.processPacket(TransportStream.java:360)
    at nl.digitalekabeltelevisie.data.mpeg.TransportStream.parsePSITables(TransportStream.java:329)
    at nl.digitalekabeltelevisie.gui.TSLoader.doInBackground(TSLoader.java:91)
    at nl.digitalekabeltelevisie.gui.TSLoader.doInBackground(TSLoader.java:42)
    at java.desktop/javax.swing.SwingWorker$1.call(SwingWorker.java:304)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.desktop/javax.swing.SwingWorker.run(SwingWorker.java:343)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Probably the problem is the index increment. Dunno what the DVB/TS standard says on this

EricBerendsen commented 1 year ago

Thank you for reporting this. Looks like there is something strange in your file, it has more TS packets then DVB Inspector is expecting. So there is probably something wrong in the file, but DVB inspector should be able to handle it (and report it)

I would like to ask you to provide the file for testing, but as it has 33.345.520 packets of at least 188 bytes, it will be over 6 GB inn size. So sharing it may not be trivial.

teaalltr commented 1 year ago

Yes it was a big file, unfortunately I had to delete it to save space. Btw if we find some way to share them, I have 2 TS files from Italian muxes, Rai (Italian public broadcasting company) and Mediaset (a private one). I wanted to analyze how the DVB TA (targeted advertising) is implemented. Mediaset seems to be wrapping SCTE-35 messages in DSM-CC objects, tag id 70. For Rai something like that but can't figure that out. I found some more details in the reverse-engineered HbbTV app for Mediaset. If you are interested (to add DVB TA parsing and stuff and to figure stuff out for those broadcasters), I can share the files with you with some kind of link in Google Drive maybe

EricBerendsen commented 1 year ago

Yes, I am interested in these files. Sharing using Google drive should work (or https://wetransfer.com/ or any other file sharing service like you prefer).

teaalltr commented 1 year ago

@EricBerendsen here they are https://drive.google.com/drive/folders/12Oyz6ZVUtDhd2suqzvX7ouwObpeSqmyc?usp=sharing I also included the HbbTV js for both Rai and Mediaset. For Mediaset, check from line 8735 on (that's the function to parse timeline/SCTE-35 messages inside the DSM-CC stream). Mediaset uses the old APIs (adding event listeners to DVB the stream). Rai uses the new Media Replacement API (see https://www.hbbtv.org/wp-content/uploads/2020/02/HbbTV-SPEC-00478-000-targeted-advertising-specification-part-1-no-etsi-logos.pdf, "switchMediaPresentation"). I think Rai's approach is more standard, as it lets the TV find the timeline and switch. They should be using a TEMI timeline. DVBInspector misses a timeline viewer/analyzer so one can graphically see events and splices - btw it would be very cool. This paper about TS timelines could be of interest: https://ir.cwi.nl/pub/23650/23650B.pdf. I'm still trying to figure out how to read Rai events in the stream

EricBerendsen commented 1 year ago

@jxxp9 Thank you very much for the streams, and the extra information. A lot of new information (at least for me).

teaalltr commented 1 year ago

Found some more stuff. Here: https://dvb.org/wp-content/uploads/2021/03/dvbscene-57.pdf search for "DSMCC do-it-now stream event.", there are some more details on the Mediaset implementation, which should be 100% per standard. In the stream I found these do-it-now events: image which indeed hold an SCTE-35 message: image As you can see, the stuff after "AQEA" is base64 encoded, exactly as I found in line 8735 of mediaset_hbbtv.js. It should be all as per DVB-TA part 1 specification, chapter 6.3 onwards maybe (a DSM-CC object wrapping a SCTE-35 one) https://dvb.org/wp-content/uploads/2020/12/A178-1r1_Dynamic-substitution-of-content-in-linear-broadcast_Part1_Signalling_Draft-TS-103-752-1v121_Feb-2021.pdf.

Moreover, I found this in the PID section: image which seems to be holding info about breaks: image Dunno how to link that to the SCTE-35 stuff tho. The channel name related to the channel code can be found in mediaset_channels.json, in the shared folder, in that case is "Canale 20 HD" image

teaalltr commented 1 year ago

@EricBerendsen I've added mediaset_hbbtv_adv_notObfuscated.js to the shared folder, which is the same as the other mediaset file but with original comments and so on (one of the streams referenced a dev hbbtv app with all these files not obfuscated or minified), please see line 1014, it should be much clearer.

I've also found the actual DSM-CC stuff that contains the SCTE data, it's this pid stream here: image

The data after the AQEA string (up to the = symbol) are in fact base64 encoded and are a valid SCTE-35 message (you can use this decoder to check: https://comcast.github.io/scte35-js/; the descriptor data you see in the form of an array are in fact an ASCII string, an upid url like this: image )

So, taking a further look at the standard (https://dvb.org/wp-content/uploads/2020/12/A178-1r1_Dynamic-substitution-of-content-in-linear-broadcast_Part1_Signalling_Draft-TS-103-752-1v121_Feb-2021.pdf), the AQEA string too is base64 encoded (see 6.3.1), because the whole AQEA + the rest is in fact a DSM-CC_stream_event_payload_binary() (AQEA is its fields before the SCTE-35 section). I've mapped the bits to the struct's fields (AQEA decoded from base64 is 0x01 0x01 0x00), you can see the fields not present with the X and the mapped bits in the No. of Bits section, next to the field size, here: image

So, if my analysis is correct, mediaset should be using:

The calculation with the ad duration and so on are in the unobfuscated js file, image

The standard says (chapter 6.3.1):

The SCTE 35 message section may be carried either directly in a DSM-CC stream event, or in a DSM-CC object carousel file, referenced by the DSM-CC stream event

Here the latter is used. The stream event is TA_EVENT, subscribed from urlStreamEventObj dvb://onid.tsid.sid.112/MDS_ENABLER_ADV (see js file above): image


Other signalling method used

There is also another option, they seem to also use another signalling method (when in js code scteEventsMode is false). In that case they use these events: image in a stream url like this (see the !scteEventsMode): image with constants as per https://enabler.msf.cdn.mediaset.net/SYS/base.json: image

scteEventsMode is set by: image (same js file as before, line 66)

In this latter signalling method (the one with scteEventsMode false), the signalling is not in a SCTE-35 message, but the upid is directly in the payload. This is where I found it: image and here is what is in the field selected: image

The former implementation (the one with data wrapped in SCTE-35) seems to be 100% standard, the other one is not I think. Both are available in the analyzed stream.

(for your interest, the unobfuscated js files can be found by heading to http://hbbtv.mediaset.net/app/mplayhbbtvgoldzoo/dev/index.html, in the source HTML you can find these commented out references: image then you just replace "index.html" in http://hbbtv.mediaset.net/app/mplayhbbtvgoldzoo/dev/index.html with for instance "js/manager/advManager.js")

teaalltr commented 1 year ago

@EricBerendsen added some more stuff to the previous post, that should be enough detail for now 😄

EricBerendsen commented 1 year ago

Yes, that is more than enough detail for now! Thank you very much, interesting stuff. Very nice and detailed analysis!

Thinking about if and how it is possible to use this in DVB Inspector, but there does not appear to be a way to determine from the .ts alone what stream_events are used for and in what format. Just looking for the string "AQEA" is not enough, when TEMI is used the first bytes will be different.

So the parsing as CC_stream_event_payload_binary has to be optional, or configurable. So maybe a right-click menu option "Interpret as DSM-CC_stream_event_payload_binary" on a DSM-CC Stream Descriptor List? I have to think about it some more.

teaalltr commented 1 year ago

@EricBerendsen I thought we might try and parse the packets and guess if it is a valid packet. I wrote this (pseudo)(C | Java) function to parse/check this packet type. Offset/indexes etc might be wrong (as one might expect, didn't try it). You could add some kind of generic HeuristicPidHandler /packet matcher/pid handler/ whatever (yeah OOP 😄) that matches the packet/stream as a DSM_CC_stream_event_payload_binary() and then builds/calls a DVBTAPidHandler that properly contructs references to other DSM-CC objects/stuff, as other pid handlers do. (Maybe the name PidHandler is not the right name here, but I guess you get the point) Or, you could give PidHandlers classes a HeuristicTest/match method (maybe declared in a different interface?) that recognizes it if no other means are available

All this because other types of streams/packet types might need an heuristic test to be identified, since they might not have a defined PID number in the standard. That would genericize the stream identification.

boolean isValid_DSM_CC_stream_event_payload_binary(byte[] packet) {

        byte field_DVB_data_length = packet[0];

        byte field_reserved_zero_future_use1 = (packet[1] & 0b11100000) >> 5;    // TODO CHECK ENDIANNESS!!
        assert(field_reserved_zero_future_use1 == 0);

        byte field_event_type = (packet[1] & 0b00010000) >> 4;            // TODO CHECK ENDIANNESS!!

        byte field_timeline_type = packet[1] & 0b00001111;         // TODO CHECK ENDIANNESS!!
        assert(timeline_type < 2);

        byte field_temi_component_tag = 0;
        byte field_temi_timeline_id = 0;

        // used to add a byte offset to parse fields after the timeline_type field
        int offset_timeline_type = 0;

        if (timeline_type == 0x2) {
                // in this case we have 2 bytes more
                offset_timeline_type = 2;

                byte field_temi_component_tag = packet[2];
                byte field_temi_timeline_id = packet[3];
        }

        byte field_reserved_zero_future_use2 = packet[2 + offset_timeline_type];
        assert(field_reserved_zero_future_use2 == 0);

        // this is because 
        // DVB_data_length:= "This 8-bit number gives the length, in bytes, of the fields following the DVB_data_length field
        // prior to the private_data_length field."
        assert(field_DVB_data_length == 1 + offset_timeline_type);

        byte field_private_data_length = packet[3 + offset_timeline_type];
        uint field_private_data_specifier = 0;
        byte[] private_data_byte = new byte[field_private_data_length];

        // used to add a byte offset to parse fields after the field_private_data_length field
        int offset_private_data_length = 0;

        if (field_private_data_length > 0) {
                offset_private_data_length = sizeof(field_private_data_specifier) + field_private_data_length;

                field_private_data_specifier = parseUInt(packet[4 + offset_timeline_type]);
                for (int i = 0; i < private_data_length - 4; i++) {
                        private_data_byte[i] = packet[4 + offset_timeline_type + i]
                }
        }

        // used to add a byte offset to parse fields after the private_data_length field
        int offset_event_type = 0;

        byte carousel_object_name_length = 0;

        // in the standard, unnamed, only marked as byte array
        byte[] carousel_object_name = new byte[carousel_object_name_length];

        if (field_event_type == 1) {

                for (int i = 0; i < carousel_object_name_length; i++) {
                        carousel_object_name[i] = packet[5 + offset_timeline_type + offset_private_data_length + i];
                }
        }

        boolean validSCTE35;
        int lengthSCTE35;
        if (field_event_type == 0) {
                validSCTE35 = checkSCTE35(packet[5 + offset_timeline_type + offset_private_data_length]);
                lengthSCTE35 = parseSCTE35(packet[5 + offset_timeline_type + offset_private_data_length])

                assert(validSCTE35);
        }

        assert(packet.length == 5 + offset_timeline_type + offset_private_data_length + 
                                        (field_event_type == 1 ? carousel_object_name_length : 
                                                                 lengthSCTE35)
        );
}

Moreover, we could cross-check the SCTE35 message content (or the carousel_object_name) to find the referenced stuff in there and show the info to the user in a more user-friendly way. The related Event could be found by looking at event objects (from all event streams) with PTS near or equal to that of the SCTE35 message (or using the other time-signaling methods defined in the standard).

Of course before parsing the packet using my function, one will need to check if it's valid base64 content and if yes decode it, then pass the result to the function.

EricBerendsen commented 1 year ago

I have made a first attempt at just trying to parse the stream events, and if it happens to be a valid DSM-CC_stream_event_payload_binary then show it.

image

That works, not sure how to proceed. I can not connect that event to one of the services, as they all use the same event stream. Inside the Splice_info_section there are two descriptors, the second a normal segmentation_descriptor. The first is strange, it has tag 0x70, but certainly is not a adaptation_field_data_descriptor. From the .js I can see it is used to check whether the event is for the selected channel;

var o = confManager.getCurrentChannel(); if (a[0].private_bytes && a[0].private_bytes !== o.code) logManager.logWithEvidence("onFiredTA_Event: SCTE 35 is related to a different channel - current channel: " + o.code); I can not find any specification on that descriptor.

So the Splice sections can not be matched to a service, and do not fit into the normal SCTE-35 structure;

image

teaalltr commented 1 year ago

The 0x70 (112) descriptor is a custom one from Mediaset as far as I can understand it, and using the tag 0x70 is not standard, as far as I can see. The DVB-TA specification seems not to define a link to the associated program PID (couldn't find it in the standard, please double check if you can), that may be the reason why Mediaset decided to use a custom descriptor with a channel code in it (it contains private data with a spot UUID and a code with the channel name, and maybe other stuff)

teaalltr commented 1 year ago

Btw great work! Would like to give it a try

EricBerendsen commented 1 year ago

@jxxp9 I merged my updates into master. You can find the build result at the bottom left (snapshot) here; https://github.com/EricBerendsen/dvbinspector/actions/runs/5637062199 Let me know what you think.