IRT-Open-Source / scf

Subtitling Conversion Framework
Apache License 2.0
52 stars 18 forks source link

STLXML2EBU-TT: handling of content out of boxing #26

Closed spoeschel closed 7 years ago

spoeschel commented 7 years ago

When a subtitle's TF contains content that is not enclosed by StartBox/EndBox element pairs, currently spaces (in the form of space elements) are discarded - but any text is copied. E.g. in case of This is a test. in STL (without enclosing boxing), this results in Thisisatest. in EBU-TT.

For consistency space/text content outside of boxing should be treat equally i.e. text should be discarded as well in that case.

An alternative way is to discard neither spaces nor text in case the TF does not use boxing at all (which is sometimes seen in STL files).

braincoded commented 7 years ago

I've experienced this issue with EBU STL files containing open subtitles. Whether a file contains closed captions or open captions isn't evaluated by SCF and can thus lead to the result described above.

spoeschel commented 7 years ago

This will be fixed and adhere to the above described "alternative way" i.e. to discard only text which is outside of (present) boxing.