Closed nweber closed 8 years ago
being tracked in issue #418
I've not used it in anger but https://github.com/incrediblesound/xml-to-js/blob/master/xml-parse.js is a lot smaller and minifies down to less than 2k?
@chris-heathwood-piksel xml-parse.js is not my first choice for dash.js, it uses regexp to parse the XML. @AkamaiDASH is this issue to speed up parsing or to reduce the size of the player?
@kirkshoop I merged the two tickets because I though the scope of the reduing size would cover this but now that I look at it again I think I will reopen to track this and cross reference the two tickets!
If I may, I would like to propose this one, which is 9.6Kb, but is commented, much more easy too use, less complex than the current one and allow the TTML parser to correctly parse inline span elements: https://github.com/henrikingo/xml2json
Let me explain. Currently in the TTML Parser, the subtitle can only be parsed completely if it is a simple
<p> .. </p>
or a simple
<p><span> .. </span></p>
and then working only for the following json structure:
p_asArray{
__text: "hello world"
other attributes
}
Or:
p_asArray{
__text: undefined
other attributes
span_asArray{
__text: "hello world"
other attributes
}
}
However if you have a span plus something outside of it in the paragraph, or several span, it will be impossible to get to know how interleave the texts:
For the following subtitle:
<p style="defaultStyle" end="00:00:10.000" begin="00:00:00.000" region="defaultRegion" xml:id="sub1" >Hello, <span style="defaultStyle" > I am a </span> EBU-TT-D <span style="defaultStyle" >subtitle </span> 1</p>
So as you can see, no way to correctly interleave the parts together.
The parser I propose is giving the following result:
"body": {
"p@style":"defaultStyle",
"p@end":"00:00:10.000",
"p@begin":"00:00:00.000",
"p@region":"defaultRegion",
"p@xml:id":"sub1",
"p":[
"Hello, ",
{
"span@style":"defaultStyle",
"span":" I am a "
},
" EBU-TT-D ",
{
"span@style":"defaultStyle",
"span":"subtitle "
},
" 1"
]
}
Closing a lot has been changed since this issue was created.
There's two key pieces to parsing. 1) xml2json.js 2) objectiron.js
xml2json converts the xml to a json format that is easier to interact with. objectiron 'irons' and 'flattens' the various hierarchical properties as defined by the dash specification. This ends up pushing all of the inherited values down to the lowest possible level so that we don't have to go back up the chain alter.
Both of these operations are slow and need to be refactored / modified / changed to be more efficient.