IRT-Open-Source / scf

Subtitling Conversion Framework
Apache License 2.0
52 stars 18 forks source link

STLXML2EBU-TT: chars before first space/control code #47

Closed spoeschel closed 7 years ago

spoeschel commented 7 years ago

In certain cases, characters at the beginning of a text field are not processed by the transformation from STLXML to EBU-TT. This seem to affect characters that occur before the first space or control code of a subtitle.

This problem should only involve files with Open Subtitles. Teletext subtitles nowadays usually contain a Double Height control code before any subtitle text, so such files are not affected.

See also #46.

andreastai commented 7 years ago

@cazino The file you provided as example in #46 has the DSC field set to 0, which indicates an STL for Open Subtitles.

SCF currently has only the requirements to cover STL files thought for Teletext Subtitles (DSC set to 1 or 2). The coverage for Open Subtitles may be feature for a future main revision like SCF 2.x. In the meantime a workaround could be to post-process STLXML and make it compatible to EBU-STL for teletext presentation (e.g. by adding a startbox control code at the beginning of a TF field). Not that this is also currently not an SCF requirement.

Teletext subtitles nowadays usually contain a Double Height control code before any subtitle text, so such files are not affected.

More importantly EBU-STL for teletext needs to have at least one startbox control code in the TF before any text content. Therefore this issue should never apply for Teletext EBU-STL.

spoeschel commented 7 years ago

More importantly EBU-STL for teletext needs to have at least one startbox control code in the TF before any text content. Therefore this issue should never apply for Teletext EBU-STL.

You're right. I just thought of Double Height, because it is usually the first character on any Teletext subtitle line and hence no characters can be before it. But the Start Box is a better example.

cazino commented 7 years ago

Inserting a <StartBox> at the beginning of the <TF> does not work : the xslt transo complains about 'out of the box characters' and crashes, but inserting a <DoubleHeight> does the trick. Thank you again.

spoeschel commented 7 years ago

The termination of the transformation in the former case is intended: At least one present Start Box or End Box implies that boxing (according to the Teletext spec) is used.

However any text outside of boxing is neither displayed by a Teletext decoder nor converted by SCF. Hence the user is informed about this issue.

spoeschel commented 7 years ago

Addressed in README of v0.9.4.