bibanon / tubeup

Use yt-dlp to download video/metadata and upload to the Internet Archive.
https://pypi.python.org/pypi/tubeup/
GNU General Public License v3.0
424 stars 71 forks source link

Prefer .webm format; still needs testing. #300

Closed Windows81 closed 1 year ago

Windows81 commented 1 year ago

Wayback Archive doesn't have native support for MKV files. However, I found that yt-dlp prefers to download them by default.

vxbinaca commented 1 year ago

Wayback Archive

Wayback is for webpages only. You're uploading to just the Internet Archive. Theres a difference.

doesn't have native support for MKV files

It derives them into MP4 when the upload is completed to make the MKV video streamable, if you'd wait a few minutes you'd notice this.

Your PR isn't passing automated tests. Why disturb what works?

brandongalbraith commented 1 year ago

reddit:datahoarder: MKV, MP4 or WebM: which format will ensure the best video/audio quality when downloading long youtube videos?

Simple answer: WebM/MKV (they will both contain the same VP9 video and Opus audio).

Better answer: Just pick the highest resolution available. You will notice that this will often be the VP9 video version. YouTube doesn't use H.264 video for resolutions above 1080p (2K), so you will never find 4K content in this format, because it was never designed for it.

Technically VP9 video (and Opus audio) can be stored in both MKV, MP4 and WebM containers, but the Matroska-based WebM container is the 'native' container for this web video standard.

Be aware that WebM is also a Matroska (MKV) container type. But unlike its 'big brother' it is restricted to a few video codecs (VP8/VP9/AV1) and audio codecs(Opus/Vorbis) invented in 2010. Those restrictions mean that any web browser can play WebM files out of the box. That is not the case for MKV.

TLDR: We should continue as is and allow the Internet Archive derive and present as they choose.

Windows81 commented 1 year ago

Thanks for the information! I falsely assumed that Internet Archive parses each video instantaneously.

brandongalbraith commented 1 year ago

No worries whatsoever! You can find more information on their derive process at ~https://help.archive.org/hc/en-us/articles/360014487651-Files-Formats-and-Derivatives-A-Basic-Guide~ and https://archive.org/help/derivatives.php

vxbinaca commented 1 year ago

https://help.archive.org/hc/en-us/articles/360014487651-Files-Formats-and-Derivatives-A-Basic-Guide

Dead link brodie. I was gonna keep it open but okay, you're right. Thanks.

brandongalbraith commented 1 year ago

That's what I get for cribbing off the first ArchiveTeam wiki page Google turned up. Let me check with some IA peeps and I'll get back with a canonical reference wrt derive ops.