Closed vxbinaca closed 5 years ago
In response to call for PRs, propose merging https://github.com/bibanon/tubeup/pull/81 prior to shipping this.
Having WARCd a few annotations XMLs already; I'm not sure entirely how it could be implemented, but maybe, after the deletion date, tubeup could be modified at a later date to pick at a collective dataset of video annotations belonging to the individual videos, during uploads?
(what i've uploaded so far) https://archive.org/details/data-YouTube-Annotations-yt_anot_urls_nodupcheck.txt-2018-12-02-a354fb31 https://archive.org/details/data-YouTube-Annotations-ola_norsk_yt_anot_urls.txt-2018-12-02-439edf01 https://archive.org/details/data-anotids-yt_anot_links_continue00.txt-2018-12-11-5657e75f
(i have more underway locally, and probably/hopefully some others from ArchiveTeam have as well)
They're deleting the annotations on the 19th. Theres no reason to keep the flag around since it's Youtube specific and offers no functionality anywhere else Youtube-dl supports.
@DuckHP
I'm not sure entirely how it could be implemented, but maybe, after the deletion date, tubeup could be modified at a later date to pick at a collective dataset of video annotations belonging to the individual videos, during uploads?
This is best performed by the Internet Archive during deriving of the item, not TubeUp attempting to pick through an item's file collection, retrieving the file, and re-uploading with the rest of the files that make up an item. As long as the annotation collection files are properly named and the Internet Archive has those files, the problem of matching annotation files from a collective bundle and future ripped video files can be pushed off into the future.
I also wanna add that even before I touched Tubeup 3 years ago - because it's like a lifeform that evolved - annotations were collected. So essentially as far as I can tell for it's entire existence through iterations, annotations were collected. So anyone who used it got any possible annotations - not withstanding weird upload bugs like we have that aren't the S3 issue.
This is best performed by the Internet Archive during deriving of the item, not TubeUp attempting to pick through an item's file collection, retrieving the file, and re-uploading with the rest of the files that make up an item.
Perhaps that is the best. Either way, as far as i know, IA's player does not play back annotations, YET. But hey, that's a future thing, as long as the annotations are there.
Thinking about it, are you sure that the annotations flag and format is not used by youtube-dl for other services such as niconico or bilibili? Or do those use rich srt subtitles instead of YouTube's old XML? Someone needs to try it out but if there is nothing else that uses it, it has probably served it's purpose.
A fair point Antonizoon. NicoNico is walled off so testing is difficult. Billibilli works however. Get a link with annotations and I'll test.
End cards are now collected as annotations. Even videos without annotations or title cards have metadata collected. Closing because this would not collect valuable metadata.
Merry Christmas.
@vxbinaca This could be revisited I think. Annotations that were in Youtube have been archived in the Internet Archive, and the Annotations API is returning an empty response for videos that used to have legacy annotations.
What's your source that the annotations were archived? The ENTIRE sites annotations were gotten?
I updated my comment with citations, and am obtaining independent verification.
I would estimate there are about 10-15 billion videos on YouTube.
Tom Scott did a great oner video on Youtube video IDs. It shows why they're hard to obtain, unlike Vimeo (or Fetlife profile IDs). They probably got a lot, but not all. Calling for annotations gets some metadata, a tiny amount. Card can be implemented later or maybe they already are.
I honestly doubt they got all annotations on the site. It's just too large, and the ID system too complicated. Maybe if you had a ton of Warrior bots chugging away at giant blocks of them for months on end - and I mean thousands of machines doing nothing but looking up video IDs, maybe you'd get a lot of it. i just don't see with how things are laid out on that site how you get even most of them.
Legacy annotations are no longer available from YouTube. There is currently a temporary API to pull them from our archive until everything has been uploaded to IA. Side-by-side: YouTube version vs. archived version.
Once everything has been uploaded to IA, I'm planning on adding /api/v1/annotations/:id
to Invidious to better support playback in omarroth/invidious#303. I would expect it to redirect to the IA archive or fallback on YouTube if it wasn't archived.
To my knowledge, end cards are provided as a separate endpoint. End cards are not the same as cards.
Cards are still provided at the same endpoint as legacy annotations (/annotations_invideo?video_id=
), so to my knowledge it would still be possible to pull valuable metadata.
It's incredibly unlikely that you'll randomly stumble upon a valid ID (1 in 64^11). We instead pulled videos from the "recommended" bar, videos from any discovered channels, videos from any discovered playlists, and searched already archived annotation data. We archived annotations from around 1.4 billion videos.
Hopefully that is helpful, sorry if I wasn't able to respond to everything but please feel free to ask questions or for clarification.
Thank you Omar!
On Mon, Feb 11, 2019 at 10:52 PM Omar Roth notifications@github.com wrote:
Legacy annotations are no longer available from YouTube. There is currently a temporary API to pull them from our archive until everything has been uploaded to IA. Side-by-side: YouTube version https://www.youtube.com/annotations_invideo?video_id=eIIV6a2Pdh4 vs. archived version https://archive.omar.yt/api/v1/annotations/eIIV6a2Pdh4.
Once everything has been uploaded to IA, I'm planning on adding /api/v1/annotations/:id to Invidious https://github.com/omarroth/invidious to better support playback in omarroth/invidious#303 https://github.com/omarroth/invidious/pull/303. I would expect it to redirect to the IA archive or fallback on YouTube if it wasn't archived.
To my knowledge, end cards are provided as a separate endpoint. End cards are not the same as cards.
Cards are still provided at the same endpoint as legacy annotations ( /annotations_invideo?video_id=), so to my knowledge it would still be possible to pull valuable metadata.
It's incredibly unlikely that you'll randomly stumble upon a valid ID (1 in 64^11). We instead pulled videos from the "recommended" bar, videos from any discovered channels, videos from any discovered playlists, and searched already archived annotation data. We archived annotations from around 1.4 billion videos.
Hopefully that is helpful, sorry if I wasn't able to respond to everything but please feel free to ask questions or for clarification.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/bibanon/tubeup/issues/77#issuecomment-462605109, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVarYrGZ-ijrCdrvlbmBJkJWzB_ugI6ks5vMjp2gaJpZM4Y5Fuv .
I need to impliment card/end card ingestion then if it's available in youtube-dl.
Brandon, /r/datahoarder is fine sometimes. Then theres times like this where it's not fine, and it's rank amateurs with big storage (but small other things) telling me whats what.
Youtube-dl doesn't appear to be interested in annotations support for other sites, and it's not currently breaking things. So I'm going to let it be for now.
youtube-dl isn't necessarily against annotation support for sites such as niconico, it's just not a priority. End card support would be appreciated (but I have no idea what endpoint the end cards are served over).
I'm killing off the flag for annotations collection on Janurary 18th, 2019 even though the next day is when Youtube will kill it. This is being done to prevent errors with bots or projects that might be impacted by the failure to get annotations - even though there are if I recall checks put in place by Antonizoon and Refeed to prevent failure of rips in that case - all I'll be doing is zapping the depreciated annotations flag.
If you're sitting on PRs or want to make a fix, any time between now and 1/18/19 would be a dandy time to submit PRs for testing so it can be done in one shot.