Tzahi12345 / YoutubeDL-Material

Self-hosted YouTube downloader built on Material Design
MIT License
2.59k stars 266 forks source link

[Reg: International Content] Translated Titles and Original Language Titles/Descriptions #138

Open GlassedSilver opened 4 years ago

GlassedSilver commented 4 years ago

Often times I want to download international content, for example Japanese vtubers like Kizuna Ai.

The issue is, that the videos will download with subs if available, yes, but the title in the filename and in YoutubeDL-Material's UI is Japanese in this case.

This can make discovering content in your own library a bit difficult when understanding the video isn't thanks to the downloaded subs (if available).

I would like to advocate the following things:

1) Always store the original title, the English title and the title in languages selectable by the user. 2) If possible: let the user choose whether the filename should have the original or a translated title in its name, if {title} is selected as token for the filename. This might be tricky and less needed? Still not sure about this myself. 3) Allow the user to select which language(s) to use in the YTDL-M UI. E.g.: I'm German, can speak English fluently and don't speak Japanese. I would love to be able to display Japanese videos with their English translation title, see German and English videos with their original titles and never ever do I want to see what YT sometimes does: translate an English video title to broken German for me. :D GRRR

A search feature should be able to fetch any of the titles and return the results no matter the display choice.

Tzahi12345 commented 4 years ago

Always store the original title, the English title and the title in languages selectable by the user.

If possible: let the user choose whether the filename should have the original or a translated title in its name, if {title} is selected as token for the filename. This might be tricky and less needed? Still not sure about this myself.

Is it possible to see the title in various languages? I'm looking at the VTT subs info and it doesn't seem to have title included. If the information isn't easily available, this might be a hard feature to implement. If this is possible, renaming the file should be easy enough to implement, maybe a simple checkbox in the language select area of the settings could suffice for choosing between the translated and non-translated file name.

Allow the user to select which language(s) to use in the YTDL-M UI. E.g.: I'm German, can speak English fluently and don't speak Japanese. I would love to be able to display Japanese videos with their English translation title, see German and English videos with their original titles and never ever do I want to see what YT sometimes does: translate an English video title to broken German for me. :D GRRR

Should be possible too! This can be included in the language select option in the settings (and if any app-wide translations exist for that language, say de, it will show the translated YTDL-Material). Right now only English and Spanish are possible options, as those are the only languages I'm fluent in. If you're interested in doing translations for German, you can see the guide here.. It may take a couple hours as there are quite a few strings to translate, so I understand if you wouldn't want to do it.

But regardless, I'd open it up to a bunch of languages so even if app-wide translations don't exist, video title translations still will appear.

GlassedSilver commented 4 years ago

Personally I usually prefer most of my stuff to run in English, I wouldn’t use the German interface so I guess I’d be a bad long-term maintainer, but I can give it a whirl!

As for the translated titles, I have honestly no idea how YouTube stores and serves them. I don’t think youtube-dl interfaces that?

Should look into the issue tracker for youtube-dl and see if that is something they are working on I guess.

GlassedSilver commented 4 years ago

I just remembered that I once looked into this issue on the youtube-dl side of things already and the issue was closed, although the alleged duplicate isn't quite the same desired path.

Support for translated titles: https://github.com/ytdl-org/youtube-dl/issues/13811 And the desired pull of original titles: https://github.com/ytdl-org/youtube-dl/issues/10758

I think those are worth looking into and possibly helping out the youtube-dl team at their code-level. Unfortunately my coding skills are rusty at beast as I previously mentioned, certainly never dealt with what they are using, so... As much as I hate to say it, my drive to fix this can't be converted into coding energy directly. :/

Tzahi12345 commented 4 years ago

As much as I hate to say it, my drive to fix this can't be converted into coding energy directly. :/

That's pretty much how I feel about the youtube-dl library -- there's just so much going on under the hood.

Some guy in the second link you included had a suggestion that got me thinking: what if we don't rely on YouTube for translated titles? Not sure if this is a good idea, but we can use a third party tool to handle the translation (we're milking Google's API with YT downloads, maybe do the same for translations?). Issue is, these wouldn't be the official translations (I didn't even know those existed?!) so you might still find broken German/English.

I want to keep "it's youtube-dl's problem" as a last resort, because at that point it would be out of the scope of this project and probably never get done. If you think of any workarounds, let me know!

GlassedSilver commented 4 years ago

Oh, machine translation is definitely something we have to avoid as a first-stop solution. It could be nice to include as an accessibility feature where videos that got community subtitles are not providing English titles (or whatever translation) and where you don't want to rely on the title as much as quickly reference the rough content for the oddballs, but it cannot be the replacement I don't think. :/

The issue is, even though Google's machine translation has improved a lot over the years, the titles of videos especially are often very context specific either in regards to the video itself or the channel having some of its own lingo, memes, community terms that may have different meanings throughout the different language-specific community parts.

That's honestly some of the stuff that not even a "mature" AI will likely properly ever catch up to.

I want to keep "it's youtube-dl's problem" as a last resort, because at that point it would be out of the scope of this project and probably never get done. If you think of any workarounds, let me know!

I totally hear you and I agree somewhat. I guess if this project could pitch solutions a savvy youtube-dl maintainer might look into it and patch upstream youtube-dl. Until then I'm most definitely for a custom approach. Heck, it'd be a selling point for this project over others even. Then again, the entire project is well more than just your average GUI wrapper anyhow. :D I like the direction we're headed here, but will definitely keep thinking about this, since mixed-language content is a big part of what I consume.

One idea I had in mind is that a headless browser could scrape the title. You'd have to specify a locale for the browser per language and then see if that catches titles successfully enough for videos that have multi-lang titles.

It's not really a light-weight solution and probably none that would get ported into youtube-dl, but so far the only idea I had.

Then again maybe there are really really small headless browsers? Heck, a text-only browser would do the trick for this I guess. Unless there is an API (documented or not) to catch that title.

Maybe this project's issue could shed some light: https://github.com/TeamNewPipe/NewPipe/issues/3089

It's an alternative YT client that doesn't rely on YT's API.

GlassedSilver commented 4 years ago

[Possible idea for a second issue incoming that would make a case the headless-browser approach, although if a better way can be found for titles and descriptions just using below idea's code for two things shouldn't hinder accuracy or efficiency. :)]

Another thought I had was that with a headless browser like Chrome-headless we could archive auxiliary data even better.

Think channel pages. And in a really fancy scenario a user could create aliases to link a content creator's different platform profile pages together. e.g. you follow a youtuber and they also have a Twitch page where they sometimes stream and some of your favorite streams you have as downloaded VODs.

Now imagine having their channel page graphics, descriptions and stuff mirrored on your own youtubedl-material instance. For both and you'd link them together (manually I would suggest. Shouldn't be a too common task and automating this can easily throw false positives. Maybe automated suggestions... I digress)

Now on the local channel page you could flick a switch to switch between their YT and Twitch channel page and videos.

Probably worthy of its own issue to track this if you fancy the idea, I think the issue of creating some channel page system would come up anyhow if we want to carry youtubedl-material to that goalpost of making it a first home for your content consumption anyhow. Bonus: if we could version channel pages, so we could eventually see how the channel design and presentation has changed throughout the months, years, ... (this would then also facilitate the need to probably adapt to new YT/Twitch UIs as well that we mirror. Suddenly a channel could display external links at one point in YT. Or how Twitch added channel header graphics at some point. So both versions for the layout style as well as versions for regularly pulled mirrored copies of the channel pages themselves.