aajanki / yle-dl

Download videos from Yle servers
https://aajanki.github.io/yle-dl/index-en.html
GNU General Public License v3.0
309 stars 52 forks source link

Possible development ideas #374

Open DarrenPIngram opened 1 week ago

DarrenPIngram commented 1 week ago

I don't know how open you are to further development ideas, or how much more you would want to expand the application, but if you never tell or ask, you can never know. Neither can the developer be a mindreader.

  1. Maintain a log of downloaded (successful) urls. Add an optional flag (and .conf option) to then NOT download a requested file IF it exists in the history file (Areena URLs are unique per program or I've not found any recycled). Optional error/response code or flagging for any automation that may exist. (User case, I might download all MOT episodes, when I remember, it can be easier to just send one link rather than several, or fight with Yle to identify the "new" episodes, or then delete 190 of 200 episodes that I've seen (wastes everyones time and resources too). Particularly useful if they back-fill an archive several years back and you don't know to look to 2013 to see if anything has changed.)
  2. PVR-type functionality. This may be more difficult, depending on any API support that Yle may have. I am thinking of some basic PVR such a how the Get_Iplayer project has. It need not be as flashy with a web view (as they have now) and control/program search but dream big. If a web view even a simple keyword search on cached programme data/basic filtering/option to enter name or even direct Yle address. Or otherwise some parseable material in a text file for a scheduling element (get all new programmes that match keyword X in description or Y.). I suppose if one is ambitious and the metadata is there, maybe I could (made up) get ALL programmes with SAUNA in the title but NOT if in the category Sport or Children's (so I would not get a match from Sauna FC versus FF Jaro or Pikku Kakkonen visits a sauna). ven on a small scale if you know there are regular programs you want, e.g. MOT for the example, you could put their landing page URL in (if the API does not otherwise give program-level control in advance, or afterwards) and then let it auto download when executed (--pvrrun or something as a flag). Even then using the feature in point 1 might overcome issues if an API is not there, so you don't download hundreds of "not new" MOT episodes when you have ran the --pvrrun option say weekly. Then it might just be a text file of landing pages if the otherwise API support was not there.
  3. Linked to 1&2, even if a PVR function is not there, whether a landing page like documentaries (example) (https://areena.yle.fi/tv/ohjelmat/57-P3dgOa9BO) could be used (same theory as point 2) and it would download any "new" documentaries and ignore those downloaded per point 1.
  4. Option to rewrite filenames by formula. I don't know if this is even possible or practical, but as I archive stuff IF it was a few minutes for the user to set up, in the long run it would save me a job and time! E.g. a file is downloaded with the name Verimarjat_ Kuka kiristää ja ketä__ E03-2023-10-13T06_00. No doubt you can parse where Exx appears (if it appears or goes straight to the next "field") and then the date YYYY-MM-DD and the time attribute. For me even transforming that to TITLE - EPISODE (OPTIONAL IF THERE in original file) - DATE (obviously a guide. Even if it leaves a little rubbish e.g. trailing time, if it has changed 90% and re-ordered that's nice...).

Anyway, that was a brief brain dump of a few things. Not that I am a programmer who could contribute code (well, I really don't think you'd want my attempts ... :) :) Whether it leads to anything or not, I don't know.

ekari commented 1 week ago

Regarding the first point. I think that gallery-dl with -I flag is a nice approach; the downloaded url is commented out, but it is preserved in the file. With that you could write say a bash function that only adds the new url if it's not found in the file.

I wrote, and even used a few years, some python that used mariadb to keep track, and then call yle-dl to do the downloading. When called with url as an argument it would add it if it wasn't a duplicate, and without arguments it would download all urls unmarked as downloaded. But it was a bit complicated and somewhat error prone and over-engineered solution.

Anyway, thank you @aajanki for yle-dl. Your efforts have been an integral part of my media consumption for years :)

aajanki commented 1 week ago

The points 1-3 are great ideas but not in the scope of the yle-dl project. Yle-dl focuses just on downloading streams. I think that it would be best to implement those kind of features as a separate download manager application that calls yle-dl to do the actual downloading, a bit like what @ekari describes. Hopefully, somebody becomes inspired to write such an application.

Rewriting the output filename (point 4) is already possible using the --output-template switch. The following generates file names along the lines you describe: yle-dl --output-template '${series}_${episode}_${date}' ...

DarrenPIngram commented 6 days ago

The points 1-3 are great ideas but not in the scope of the yle-dl project

Yes, as I wrote, if you don't ask. Get_Iplayer developed from a command line downloader to what it is today, but obviously one can never demand or expect a one-man developer to do something. Well you can, but you then should be disappointed with the answer and feel ashamed for your bad behaviour.

Even so, I could squint and see point 1 could be an easier thing to implement and still be in the scope of a CLI downloader, since your app would know directly the status of a download (is there an exit code or similar) and could just write the URL downloaded to a file. And then a corresponding flag to opt-in to read the history or opt-out (dependent on the implementation). If I was a programmer I'd have offered that up. Otherwise maybe somebody who lurks here who is a programmer could knock it up and send a PR for you to consider?

You already have other elements too (take a list of URLs, output to a given directory and the like) and i use them in a simple BASH script already.

For "Elonet" I have a script that manages its own history but it is not as elegant.

Rewriting the output filename (point 4) is already possible using the --output-template switch. The following generates file names along the lines you describe: yle-dl --output-template '${series}_${episode}_${date}' ... Ah, I overlooked that. I even double checked the Github page and read through the commands, and promptly ignored the text at the bottom about getting a full list of commands... :(