aajanki / yle-dl

Download videos from Yle servers
https://aajanki.github.io/yle-dl/index-en.html
GNU General Public License v3.0
309 stars 51 forks source link

Feature request : Subtitles-only ( #254 ) #340

Open dimetime opened 1 year ago

dimetime commented 1 year ago

254 TL;DR - Apparently downloads from Areena download everything, not cherrypicking separate streams?

Thus, if downloading everything is only option, could the video/audio streams be directed to /dev/null or something similar whilst downloading?

Or --subtitles-only to implement ---backend wget and remove any downloaded video/audio-data?

aajanki commented 1 year ago

Yle-dl indeed always downloads all streams.

If you don't mind temporarily downloading the video file, you can create a postprocessing script that deletes the video file after downloading it (keeping just the substitles file).

Create a new file called keep-only-subtitles:

#!/bin/sh

# Remove the video file
# $1 is the name of the downloaded video file
# ($2 is the subtitles file)
rm "$1"

Then tell yle-dl to execute a postprocessing step after the download is complete:

yle-dl --backend wget --postprocess ./keep-only-subtitles https://areena.yle.fi/1-61825068

This assumes that you are calling this from the directory where keep-only-subtitles is located.

ghost commented 1 year ago

To save a large amount of bandwidth (above 99%), is it viable to implement downloading only subtitles? For reference, youtube-dl and yt-dlp support the feature.

IlmariKu commented 3 months ago

Can you get the subtitles as extracted, rather than embedded in the .mkv-file at all with yle-dl? At least I didn't see the option in the docs quickly glanced.

aajanki commented 3 months ago

It's not possible to download just the subtitles with yle-dl.

However, you can download the .mkv file and the extract the subtitles yourself using a tool such as ffmpeg.

IlmariKu commented 3 months ago

But, I have a question @aajanki, when I ran the command --showmetadata, it gave me the finnish subs as one of the urls. Is this an exception on how the subs are available? I haven't checked how the code works, but I'm guessing the metadata (and the CDN link) is available without the video?

Reproduce: yle-dl https://areena.yle.fi/1-1414632 --with-metadata

duration_seconds": 1709,
    "subtitles": [
      {
        "language": "fin",
        "url": "https://cdnapi-legacy.kaltura.com/api_v3/index.php/service/caption_captionAsset/action/serve/captionAssetId/1_zknqo0g1/ks/MDc3NWI3MzcxOWE2ZWU4NDgyMzA1MWQ3NDhlMzlkNzEyZjhjYTBiNnwxOTU1MDMxOzE5NTUwMzE7MTcxODQ3NjU4OTswOzMwNTIzO292cEB5bGUuZmk7ZG93bmxvYWQ6MV85YjFrMmEzYw==",
        "category": "ohjelmatekstitys"
      }
    ],
aajanki commented 3 months ago

Even though --showmetadata sometimes lists subtitles, yle-dl never downloads these external subtitle files. The subtitle section in the metadata is left over from an earlier yle-dl version. Of course, you can download the subtitle files manually.

Areena provides these external subtitle files only on some videos. Recently published videos don't seem to contain those anymore.