Dash-Industry-Forum / dash.js

A reference client implementation for the playback of MPEG DASH via Javascript and compliant browsers.
http://reference.dashif.org/dash.js/nightly/samples/dash-if-reference-player/index.html
Other
5.15k stars 1.68k forks source link

Make DashParser more efficient. #104

Closed nweber closed 8 years ago

nweber commented 10 years ago

There's two key pieces to parsing. 1) xml2json.js 2) objectiron.js

xml2json converts the xml to a json format that is easier to interact with. objectiron 'irons' and 'flattens' the various hierarchical properties as defined by the dash specification. This ends up pushing all of the inherited values down to the lowest possible level so that we don't have to go back up the chain alter.

Both of these operations are slow and need to be refactored / modified / changed to be more efficient.

dsparacio commented 9 years ago

being tracked in issue #418

chris-heathwood-piksel commented 9 years ago

I've not used it in anger but https://github.com/incrediblesound/xml-to-js/blob/master/xml-parse.js is a lot smaller and minifies down to less than 2k?

kirkshoop commented 9 years ago

@chris-heathwood-piksel xml-parse.js is not my first choice for dash.js, it uses regexp to parse the XML. @AkamaiDASH is this issue to speed up parsing or to reduce the size of the player?

dsparacio commented 9 years ago

@kirkshoop I merged the two tickets because I though the scope of the reduing size would cover this but now that I look at it again I think I will reopen to track this and cross reference the two tickets!

SoleneChiche commented 9 years ago

If I may, I would like to propose this one, which is 9.6Kb, but is commented, much more easy too use, less complex than the current one and allow the TTML parser to correctly parse inline span elements: https://github.com/henrikingo/xml2json

Let me explain. Currently in the TTML Parser, the subtitle can only be parsed completely if it is a simple

<p> .. </p>

or a simple

<p><span> .. </span></p>

and then working only for the following json structure:

p_asArray{
__text: "hello world"
other attributes
}

Or:

p_asArray{
__text: undefined
other attributes
span_asArray{
       __text: "hello world"
       other attributes
       }            
}

However if you have a span plus something outside of it in the paragraph, or several span, it will be impossible to get to know how interleave the texts:

capture d ecran 2015-03-11 a 08 46 33

For the following subtitle:

<p style="defaultStyle" end="00:00:10.000" begin="00:00:00.000" region="defaultRegion" xml:id="sub1" >Hello, <span style="defaultStyle" > I am a </span> EBU-TT-D <span style="defaultStyle" >subtitle </span> 1</p>

So as you can see, no way to correctly interleave the parts together.

The parser I propose is giving the following result:

"body":    {
      "p@style":"defaultStyle",
      "p@end":"00:00:10.000",
      "p@begin":"00:00:00.000",
      "p@region":"defaultRegion",
      "p@xml:id":"sub1",
      "p":[
        "Hello, ",
        {
          "span@style":"defaultStyle",
          "span":" I am a "
        },
        " EBU-TT-D ",
        {
          "span@style":"defaultStyle",
          "span":"subtitle "
        },
        " 1"
      ]
    }
dsparacio commented 8 years ago

Closing a lot has been changed since this issue was created.