LuteOrg / lute-v3

LUTE = Learning Using Texts: learn languages through reading.
https://luteorg.github.io/lute-manual/
MIT License
485 stars 46 forks source link

Support some kind of YouTube video import #73

Open jzohrab opened 10 months ago

jzohrab commented 10 months ago

Suggestion from Discord, roughly.

It would be nice/fun to be able to "import" YouTube videos, or have the videos play, in Lute, and somehow have the subtitles be available and processed like other texts. The video could play, the user could just move through the pages of text as it progresses. Lute currently doesn't support any kind of audio-to-text syncing with timestamps, so that's not possible.

The page would need to be laid out differently, with the video stickied or something at the top, and a reading pane below it. The current hardcoded frames for the terms and the dicts might get messy.

jonathonmh commented 10 months ago

I agree - this is certainly something I'd like in the lute application.

As a thought, yt-dlp.exe allows subtitle files to be downloaded from youtube (where they exist), and typically are in vtt format. These could be uploaded and parsed in lute and 'somehow' fed into its internal player, in which case the actual sentences of the book text could be highlighted as the audio plays.

I imagine the use of yt-dlp (or an equivalent python library) could be combined with lute in a multi-image docker setup, automating the subtitle download.

As for the embedding of youtube video I imagine there is a standard way to approach this, and for this functionality the ability to slow the video (and audio) is highly desirable for my own study.

I'd like to investigate the above...

jzohrab commented 10 months ago

This one is a challenge. It should support a pure python approach, because I'm hoping to keep features available for as wide an audience as possible. Relying on Docker creates problems for many Windows users, and for some mac users that hit performance problems. I personally also prefer to use pip :-)

yt-dlp and others rely on ffmpeg, but if we're just getting the subtitles, and embedding the video, then there's no real need for all of the tools of yt-dlp. YT videos can be embedded within an iframe. It becomes more of a question of just getting the proper subtitles file, parsing it, and showing the pages.

Lute doesn't do any audio-to-timing sync. For a first pass implementation of this, just getting the subtitles and showing the video would be a great start.

jzohrab commented 10 months ago

fyi - In my opinion, this is a non-trivial piece of work, I think! It affects the reading page, potentially page parsing, and various actions. I haven't thought through all of the design/architecture implications.

jzohrab commented 10 months ago

Actually, what might make more sense for this would be some kind of browser add-in that Lute users can use that accesses their Lute data while watching YouTube. I have no idea how to do this, but it seems like a better direction to take -- rather than forcing everything to be pulled into Lute, with all of the code changes that entails, it would be nice if Lute somehow could be used in a more open fashion.

jonathonmh commented 10 months ago

I agree that a browser extension might work, i.e. leveraging other open source projects. This functionality seems similar to "language reactor" (though I haven't really played with this; I'm looking for FOSS solutions).

A quick google search also returns https://easysubs.cc/, which seems quite promising as separate solution to this (though a quick 5 minute test didn't prove much)

Noting my current lack of knowledge of the lute architecture, I would say that I see the lute<>extension being appropriate moreso for the "ruby" topic (https://github.com/jzohrab/lute-v3/issues/71), and its follow on usage with Anki (I.e. how does one generate study materials).

I will create a branch for SRT/VTT import, and see if there's any easy to show videos within lute.

fanyingfx commented 10 months ago

I think if Lute can expose some APIs to make use of some lute features and data such as parsing text and some db operation for terms and etc. It could be helpful for developing the broswer add-ons or other ways to learing with videoes. And I hava a very rough idea about the the parse_text interface For example If the Japanese enabled, for the sentence '私は元気です。' The request:

{
  "text": "私は元気です。",
  "language": "japanese"
}

then the reponse should be like.

[
  {
    "is_word": true,
    "reading": "わたくし",
    "status": "1",
    "text": "私"
  },
  {
    "is_word": true,
    "reading": "",
    "status": "0",
    "text": "は"
  },
  {
    "is_word": true,
    "reading": "げんき",
    "status": "0",
    "text": "元気"
  },
  {
    "is_word": true,
    "reading": "",
    "status": "0",
    "text": "です"
  },
  {
    "is_word": false,
    "reading": "",
    "status": "0",
    "text": "。"
  }
]