add-ons / plugin.video.vrt.nu

Kodi add-on to watch content from VRT MAX
https://www.facebook.com/groups/kodivlaanderen
GNU General Public License v3.0
110 stars 20 forks source link

Subtitles show an extra empty line #45

Closed dagwieers closed 4 years ago

dagwieers commented 5 years ago

Describe the bug The subtitles show an extra empty line between subtitles. ~Most likely this is an issue with DOS (\r\n) vs Unix text-files (\n).~ (And this issue is likely in Kodi's subtitle handling, not in the addon)

(please complete the following information):

dagwieers commented 5 years ago

Looking into Kodi and inputstream.adaptive, I think the problem is with the conversion to TTML (https://github.com/peak3d/inputstream.adaptive/blob/master/src/parser/TTML.cpp#L87) and SRT (https://github.com/peak3d/inputstream.adaptive/blob/master/src/parser/TTML.cpp#L262).

Without actual debugging (can't find an easy way to dump the files) I think it is adding both \r\n and additionally an \n.

mediaminister commented 5 years ago

I noticed that this problem only occurs with the on-demand streams, the livestreams don't show the extra empty line. So, I thought there might something wrong with the TTML-subtitles in the on-demand streams from VRT.

To investigate, I extracted the TTML-stream from the last episode of "De Ideale Wereld" with ffmpeg:

ffmpeg -i "https://remix-vrt.akamaized.net/remix/f1d863ef-4a58-464a-ab80-c6ab2c044ac1/remix.ism/.mpd" -map 0:7 -c:d copy -copy_unknown -f data deidealewereld.ttml

I did the same for the livestream of channel "Eén" while "FC De Kampioenen" was on air.

ffmpeg -i "https://live-vrt.akamaized.net/groupc/live/8edf3bdf-7db3-41c3-a318-72cb7f82de66/live.isml/.mpd" -map 0:8 -c:d copy -copy_unknown -f data live.ttml

Then I compared the TTML-subtitles from "De Ideale Wereld" and the "Eén"-livestream.

De Ideale Wereld:

<p begin="00:06:55.268" end="00:06:58.148" region="speaker" xml:id="s108"><span style="textStyle">Daar heb je niks aan,<br></br>
aan valse Hollanders.</span></p>

Eén Live:

<p begin="429307:51:09.720" end="429307:51:11.120" region="speaker">Niet overdrijven. Er zijn <br></br>geen cowboys en indianen meer.</p>

I determined that in "De Ideale Wereld"-stream there is an (invisible) newline character after every <br></br> line break. In the livestream, there is no newline character.

The inputstream.adaptive TTML-parser treats the newline character inside a <span></span> tag as a newline in the subtitle text and that causes the extra empty line between the subtitles. Since the <span></span> tag is an explicit inline-element in all HTML-standards, I think the inputstream.adaptive TTML-parser should definitely ignore these newline characters inside <span></span> tags.

mediaminister commented 5 years ago

I fixed this bug already: https://github.com/mediaminister/inputstream.adaptive/commit/be10ef17b9c44c99452d3052098e55c63a0eccf5

And I fixed two other bugs in the inputstream.adaptive TTML parser: 1) No subtitles on Ketnet livestream cause fractional seconds with 3 digits were not supported correctly 2) Fixed typo to support font colors.

pietje666 commented 5 years ago

fixed in inputstream.adaptive by @mediaminister

dagwieers commented 5 years ago

@pietje666 I understand the wish to close issues (especially if they are not in this project), but closing them hides them from existing users. And the issue is not fixed upstream, nor is there an easy fix available from anyone so people will experience this issue in the field.

So maybe it's better to keep it open until a fix is available to users from default repositories, and maybe label these issues as fix_upstream or waiting_on_upstream.

pietje666 commented 5 years ago

The problem i see in this approach is that there will be alot of open issues, and we might loose track of the real issues. So personally i do not like it :), and people can still view the closed issues and comments.

dagwieers commented 5 years ago

So it appears that the VRT is following the official standard very closely: https://www.w3.org/TR/ttml1/#ttml-example-body

Strange they insist on doing <br></br> instead of simply using <br/>.

Let us hope this is closed upstream real soon now.

dagwieers commented 5 years ago

Fixed, wh00p wh00p ! I think we can close this one now ;-)

dagwieers commented 5 years ago

The fix is still not released :-(

mediaminister commented 5 years ago

It is! https://mirrors.kodi.tv/addons/leia/inputstream.adaptive+windows-x86_64/ https://aur.archlinux.org/packages/kodi-addon-inputstream-adaptive/ https://www.ubuntuupdates.org/package_metas?exact_match=1&q=kodi-inputstream-adaptive

dagwieers commented 5 years ago

Not for ARM. I noticed the newer 18.1 builds also shipped with newer inputstream.adaptive, so I guess we need to wait for newer LibeELEC snapshot releases...

dagwieers commented 5 years ago

Yesterday, 2019-05-02, inpustream.adaptive-2.3.17.1 was released for Kodi Leia (on 18.1) which fixes this issue finally for me. Thanks everyone, especially @mediaminister :-)

mediaminister commented 5 years ago

I think VRT NU changed something to MPEG-DASH TTML subtitling last week and for some programs the extra empty line is back again. Can someone comfirm this?

dagwieers commented 5 years ago

I noticed this as well yesterday. I assumed recent inputstream.adaptive update had a regression, but first wanted to investigate the cause before reporting.

mediaminister commented 5 years ago

Okay, I found the cause already. VRT NU is now using multiple span tags and separate subtitle colours for different characters in a soap like Thuis.

OLD VRT NU TTML (Thuis, June 20)

<p begin="00:04:37.674" end="00:04:40.954" region="speaker" xml:id="s79"><span style="textStyle">Amai. Mooie zwembroek.<br></br>
- Ah, ja. Ja, merci.</span></p><p begin="00:04:41.034" end="00:04:43.994" region="speaker" xml:id="s80"><span style="textStyle">Mannekes, gaan wij zien?<br></br>
Bob is aan het wachten.</span></p>

NEW VRT NU TTML (Thuis, June 21)

<p begin="00:07:33.333" end="00:07:35.773" region="region-11" tts:textAlign="center">
<span style="singleHeightStyle" xml:space="preserve" tts:color="cyan" tts:backgroundColor="black">Eddy. Kom eens.</span>
</p><p begin="00:07:40.733" end="00:07:42.213" region="region-11" tts:textAlign="center">
<span style="singleHeightStyle" xml:space="preserve" tts:backgroundColor="black">Wat heb ik nu weer misdaan?</span>
</p>

InputStream Adaptive doesn't process these <span> tags right. Multiple span tags inside a <p> tag should be treated as a single line subtitle, only a <br> tag can break a subtitle line. I think InputStream Adaptive breaks these <span> tags into multiple lines.

I will fix this.

dagwieers commented 5 years ago

I guess we are blessed with VRT being on the forefront of subtitle technology ;-)

mediaminister commented 5 years ago

The bar can always be set higher! No subtitles today for Thuis on VRT NU: https://www.vrt.be/vrtnu/a-z/thuis/24/thuis-s24a4663/

mediaminister commented 5 years ago

I have a fix. Can you test this? Zips: (install Kodi add-on from zip file) Linux x64: inputstream.adaptive-2.3.22_linux_x64.zip Windows x64: inputstream.adaptive-2.3.22_winx64.zip

Source: https://github.com/mediaminister/inputstream.adaptive/tree/multispan

Compiling instructions for Linux:

git clone https://github.com/xbmc/xbmc.git
git clone https://github.com/mediaminister/inputstream.adaptive/tree/multispan
cd ~/inputstream.adaptive && mkdir build && cd build
cmake -DADDONS_TO_BUILD=inputstream.adaptive -DADDON_SRC_PREFIX=../.. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../../xbmc/addons -DPACKAGE_ZIP=1 ../../xbmc/cmake/addons
make
cd ~/xbmc/addons/
zip -r inputstream.adaptive-2.3.22.zip inputstream.adaptive
dagwieers commented 5 years ago

I'm afraid I won't be able to test this easily. I would need to cross-compile to ARM to make that work, and that's not something I would like to spend my time on. And I can't easily run things on my dated laptop either. That's why it would have been nice if I could run LibreELEC on VirtualBox or KVM. I don't understand why they don't invest in making that possible, rather than supporting only VMware.

(I may be able to test this evening on Windows though)

mediaminister commented 5 years ago

No problem, I tested this already on Windows and Linux and it seems to work completely as expected. I just have to test if it doesn't break other TTML streams like Netflix or HBO. Then I will create a pull request on the official inputstream.adaptive repo.

mediaminister commented 5 years ago

Added a pull request to the inputstream.adaptive repo: https://github.com/peak3d/inputstream.adaptive/pull/284

dagwieers commented 5 years ago

This is now fixed in the just released inputstream.adaptive v2.4.0. Thanks to @mediaminister and @peak3d !

dagwieers commented 5 years ago

It appears inputstream.adaptive v2.4.0 is not available for Raspberry Pi (or ARM in general).

dagwieers commented 5 years ago

With LibreELEC v9.1.501 (v9.2 Beta 1) on Raspberry Pi using Kodi v18.4 and inputstream.adaptive v2.4.2.1 this issue is still present :-(

dagwieers commented 5 years ago

When watching Terzake live on Canvas, I also noticed that the subtitles appeared later, and there was a large gap between the next subtitle. I then compared to the online player, and there this doesn't happen. Subtitles quickly follow up, with less pause, and they appear longer on screen. Much more in sync with the spoken words.

And I also noticed that the interview of Bart De Wever appeared in blue, whereas the interviewer appears as white. I was surprised that this wasn't supported, as I have seen Kodi do this for other subtitles as well, but never saw this on VRT NU.

mediaminister commented 5 years ago

With LibreELEC v9.1.501 (v9.2 Beta 1) on Raspberry Pi using Kodi v18.4 and inputstream.adaptive v2.4.2.1 this issue is still present :-(

That's right, unfortunately, my multispan commit is not merged in the Leia branch, it's only merged in the Matrix branch: https://github.com/peak3d/inputstream.adaptive/commit/f90cab170e32476f8b70c284e2134b18b89bcdbb

When watching Terzake live on Canvas, I also noticed that the subtitles appeared later, and there was a large gap between the next subtitle. I then compared to the online player, and there this doesn't happen.

I'll investigate this, but for current affair programs, only subtitles that are prepared in advance of the live broadcast will be in sync. Subtitles for live interviews or live reports are created live and will appear a couple of seconds later.

Differences between the VRT NU web player and Kodi should not occur provided that you are approximately simultaneously watching in Kodi and the VRT NU web player. If you watch a current affair program at a later time than it's possible you get a different on demand stream. I'll check the subtitle timings for the live streams.

And I also noticed that the interview of Bart De Wever appeared in blue, whereas the interviewer appears as white. I was surprised that this wasn't supported, as I have seen Kodi do this for other subtitles as well, but never saw this on VRT NU.

Subtitle colors are not yet implemented in InputStream Adaptive's TTML parser. I can make a pull request for this.

peak3d commented 5 years ago

@mediaminister colors are implemented in TTML, but maybe not in the format used in @dagwieers sample. https://github.com/peak3d/inputstream.adaptive/blob/master/src/parser/TTML.cpp#L47

mediaminister commented 5 years ago

@peak3d You're right, it's implemented in the <style> tag, but not yet in the <span> tag: https://www.w3.org/TR/2018/REC-ttml1-20181108/#style-attribute-color

I'm currently testing an experimental fix for this: https://github.com/mediaminister/inputstream.adaptive/tree/ttmlcolor

dagwieers commented 5 years ago

I'll investigate this, but for current affair programs, only subtitles that are prepared in advance of the live broadcast will be in sync. Subtitles for live interviews or live reports are created live and will appear a couple of seconds later.

Well, in this case it appeared longer and with shorter delays between subtitles when watching online on vrtnu.be. So there must be something wrong with the timings as subtitles seem to have a delayed start in Kodi.

Differences between the VRT NU web player and Kodi should not occur provided that you are approximately simultaneously watching in Kodi and the VRT NU web player. If you watch a current affair program at a later time than it's possible you get a different on demand stream. I'll check the subtitle timings for the live streams.

I was watching simultaneously. And to ensure I was seeing this correctly, paused one of the streams so that both streams would be synchronous.

mediaminister commented 5 years ago

Well, in this case it appeared longer and with shorter delays between subtitles when watching online on vrtnu.be. So there must be something wrong with the timings as subtitles seem to have a delayed start in Kodi.

I did a test with the livestreams of Eén en Canvas at 8 p.m. I suspect there is something wrong with the timings of the livestreams from VRT NU, but I didn't see a difference between VRT NU live webplayer(https://www.vrt.be/vrtnu/livestream/) and Kodi VRT NU Live TV on a Linux pc. So, I think it's a problem with the source.

I didn't see this problem with on demand streams like Thuis, subtitling is perfectly in sync.

You can easily distinguish live and on demand subtitle streams in VRT NU webplayer: On demand subtitles have colors and a black background, livestream subtitles are white with a thin black border.

dagwieers commented 5 years ago

@mediaminister I will check next time I see this happening.

Also, In case you need more examples using colours, I noticed that De Ideale Wereld (met Stef Kamil Carlens) on-demand also has coloured subtitles online.

mediaminister commented 5 years ago

Colors is implemented in https://github.com/mediaminister/inputstream.adaptive/commit/fca16d9f1d9c63d6df05b0de202f74029e2f7cfe and works flawlessly now, but I still need to improve the C++ code and test if this doesn't break other TTML subtitle streams.

dagwieers commented 5 years ago

The Raspberry Pi build of inputstream.adaptive from @peno64 (at https://github.com/michaelarnauts/plugin.video.vtm.go/issues/1#issuecomment-536220725) seems to have fixed the subtitles-newlines.

mediaminister commented 5 years ago

Obviously @peno64 compiled the source from master branch. Still not fixed in the Leia branch.

peno64 commented 5 years ago

@mediaminister I took a branch as follows: git clone https://github.com/peak3d/inputstream.adaptive Did this a day or two ago

mediaminister commented 4 years ago

@dagwieers I think this can be closed because InputStream Adaptive 2.4.3 is released.

dagwieers commented 4 years ago

Indeed, thanks everyone, especially @mediaminister !