TheTechRobo / youtubevideofinder

Searches for lost YouTube videos in archives
https://findyoutubevideo.thetechrobo.ca/
Apache License 2.0
82 stars 10 forks source link

Add sjl-static domains for thumbnails #72

Open actuallyasmartname opened 4 months ago

actuallyasmartname commented 4 months ago

In 2006-2007, YouTube used sjl-static{number}.sjl.youtube.com to host thumbnails. https://web.archive.org/cdx/search/cdx?url=sjl-static1.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey Only issue is that the number in question goes from 1 - 16, meaning there needs to be 16 domains checked and that's pretty unrealistic for every query.

TheTechRobo commented 4 months ago

Does the CDX API support wildcards in the hostname?

TheTechRobo commented 4 months ago

Looks like not directly, but it does support regex. I wonder if that can be used.

actuallyasmartname commented 4 months ago

I think a good approach would be grabbing all the links and request a search from that since YouTube doesn't use it anymore

TheTechRobo commented 4 months ago

I don't think that's a good idea as WARCs may always be added to the Wayback Machine. We'd be missing those.

actuallyasmartname commented 4 months ago

True....

TheTechRobo commented 4 months ago

What do you think about filtering for all subdomains of sjl.youtube.com, i.e. https://web.archive.org/cdx/search/cdx?url=*.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey ?

Edit: Ah, I see, you can't filter for all subdomains and a specific prefix simultaneously. :/