Open actuallyasmartname opened 9 months ago
Does the CDX API support wildcards in the hostname?
Looks like not directly, but it does support regex. I wonder if that can be used.
I think a good approach would be grabbing all the links and request a search from that since YouTube doesn't use it anymore
I don't think that's a good idea as WARCs may always be added to the Wayback Machine. We'd be missing those.
True....
What do you think about filtering for all subdomains of sjl.youtube.com, i.e. https://web.archive.org/cdx/search/cdx?url=*.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey ?
Edit: Ah, I see, you can't filter for all subdomains and a specific prefix simultaneously. :/
In 2006-2007, YouTube used sjl-static{number}.sjl.youtube.com to host thumbnails. https://web.archive.org/cdx/search/cdx?url=sjl-static1.sjl.youtube.com/*&output=json&fl=original&collapse=urlkey Only issue is that the number in question goes from 1 - 16, meaning there needs to be 16 domains checked and that's pretty unrealistic for every query.