Open sgehrman opened 4 years ago
It won't work because all of that is rendered through javascript, which this library does not run.
Disable javascript before loading a page and then you can see what can be scraped and what cannot.
I installed a chrome extension to do this (https://chrome.google.com/webstore/detail/toggle-javascript/cidlcjdalomndpeagkjpnefhljffbnlo) but you can also do it by pressing F12
to open the console and then pressing `Cntr
If you NEED javascript, i recommend running a library like puppeteer first and then parsing that post-rendered HTML.
Youtube also has an API you can tap into, instead of scraping their site. See if that can fit your need somehow.
If you set the User-Agent to a bot when retrieving the document, then it will return all of the tags.
I'm scraping a website to get the title and description and other meta data, but it's not working on all sites.
for example: https://www.youtube.com/watch?v=3AIZAGwMRg8
final List elements = document.head.getElementsByTagName('title');
elements returns []
But other sites work just fine, like https://apple.com
I'm also using:
And on that site, I'm not seeing all the meta tags