Tzahi12345 / YoutubeDL-Material

Self-hosted YouTube downloader built on Material Design
MIT License
2.66k stars 276 forks source link

Processing Re-Uploads or third-party sources in general (e.g. saved video from earlier on on your own disk) #157

Open GlassedSilver opened 4 years ago

GlassedSilver commented 4 years ago

Going forward YTDL-M leaves me with much more peace of mind of storing the content I come across online that I want to keep and that is absolute bliss and I love that the project is so approachable and going in the direction it is going.

However, there is a bunch of older content I never got the chance (and never will!) to save with YTDL-M. At least not properly.

So there's a bunch of videos out there that got re-uploaded or that I stored on disk using archaic means like J-Downloader2 where I just tossed the Youtube URL in there and called it a day or how archive.org will archive some Twitch VODs which by now aren't accessible anymore.

I would love to suggest we look into adding a "Submit from non-original source" option.

This would mean we can drop a file to upload to the YTDL-M server and then manually edit in all the details like original URL, thumbnail, etc... basically trying to re-create as much of the info json and auxiliary files yt-dl creates as possible.

And secondly I could maybe point it to a Youtube URL that I know is a re-up and "overwrite" the info, so the channel, possibly description, upload date etc all fit again. (sometimes you can find the original upload and hence a lot of metadata in something like Google's website cache or other caches, but the video may not be there)

And there's also possible sources like archive.org as I mentioned, VODs from Twitch for example do come with a lot of key metadata there like original URL, channel name, date, ...

Example: https://archive.org/details/twitch-vod-v590360117 (in this case apparently all I can do is try to get the torrent from there, this would be a good example of needing a manual submission form to upload video files manually to YTDL-M, but other VODs on archive.org are direct-downloadable!)


All of this effort would ensure that YTDL-M could truly become the ONE place where we can store web video content, no matter how we were/are able to acquire it.

Tzahi12345 commented 4 years ago

I think this is a great idea, it's a more general form than #97 which just focuses on playlists.

Here's how I imagine the system:

There's a new /upload route for the UI where you can drag and drop files

The backend gets these files and begins processing them

That's about as much as I can flesh out this idea without dipping my beak into the code. I think 4.2 is full of features at this point, so I'll queue it up for 4.3. As always, thanks for the great suggestions!

GlassedSilver commented 4 years ago

Sorry that it took me a few days to respond. Oops!

Yes I see this is kinda similar and I fancy that this idea seems to have broader support then.

1)

For each file, you need to input required fields (probably not much)

Yup, ask for a lot, demand little. :) So you're quickly ready to go and maybe if you know you have to follow up with more metadata later on you could do something like danbooru does: a meta tag (those are the orange-color tags on there) for "tagme".

Overall and I wanted to mention this in other scenarios already as well, that we could learn a lot from danbooru. But that's just an aside and I'll bring this up in future again and further explain what I mean. Danbooru is imho the pinacle of crowd-sourced special interest content curation, organization and UX. (once you get familiar with the way danbooru works its tagging logic and many other feats make categorizing anime pictures easy as pie. Been using it to store my personal anime picture collection for months now and you'll have to pry it from my cold dead hands!)

2)

An optional field is video URL, this may be used later

Yes I agree, maybe this should be a section for two fields or even more. (add as many lines as needed)

e.g.:

Original Source: [multi-line text box here, one source per line]

Re-Uploads/Mirrors: [multi-line text box again]

Original source would be something like youtuber uploads a video to Youtube and then some other site, officially. (as in, they link to it, they run the channel on that other outlet as well, etc...)

Re-Ups and mirrors would be a fan grabbing the file and re-upping it to bilibili or their own YT channel even.

Danbooru would even have a meta tag for content that's sourced from pages other than the artist's own outlets: https://danbooru.donmai.us/wiki_pages/third-party_source

(is it obvious that I really love danbooru and the way it works? Oh mind you, this isn't to say we should copy everything from them, but there's a lot of approaches in methodology that could inspire us with this project)

3)

At the end of the data input, you click upload and the files along with their metadata gets pushed to the server.

Yup. Also it'd be good to store manually-set metadata in a side-car file so a) in case of rebuilding a library we could get user-overwrites for files with info.jsons and for files without info.jsons, well the only metadata we have to begin with!

So yes, editing a video's metadata should probably store a sidecare info file (.ytdlm?) along with the video file. Those files would work as "overwrites" and maybe even carry other meta-data that is youtubedl-material internal. Like "scrape date and time", playcount, play position and stuff like that.

Pro: lose/wipe your database file and it can be rebuilt entirely from scratch and you don't lose anything.

Maybe should make this a separate issue for easy tracking and structure, but wanna hear what you think first. :)

4)

If an image is included with the same name as the video (minus ext of course), it will used as the thumbnail.

Good catch! YES!

5)

Should include a display of the videos being processed, which persists with page refreshes

You know what, you just prevented me hitting you with another issue here, because this leaves nothing to be guessed at any moment. This gives incredible peace of mind!

6)

Will require new permission for multi-user mode, as well as a global enable/disable in the settings

Good that you think of the multi-user scenario here, good catch again. I'd have forgotten about this myself.

7)

Videos with missing metadata will use the URL (if exists) at this stage to attempt to get more info

On-demand metadata scraping? Dude.... I love you.

8)

Video file conversion will happen here with ffmpeg if the uploaded videos are not mp3/mp4. This will make the processing stage much slower, especially if the input files are big

Hmmm... can we make this optional maybe? Conversion always cause loss of detail and I'd rather opt-out and wait for a new video player that may handle the format than incur loss of detail and possible corruption due to conversion.

On that note btw: wasn't there some motivation to support more formats anyhow, if not simply because YT locks some resolutions and framerates behind VP9/AV1? I'd much much rather not be able to play a bunch of my files through ytdl-m for a while than incur conversion time, loss of detail or limiting my choice of resolutions I download...

9)

Processed videos get added to the user who added them (if in multi-user mode)

Makes sense.

And yeah, good on adding this to 4.3 This feature here tends to be one that can wait a little longer than features that deal with subscriptions and downloads that go through ytdl-m. Processing (and saving!) should come before features that deal with presentation only or that touch on data you already have, just not nice and tidily organized in ytdl-m (yet). :) (if that makes sense)