Open jaypinho opened 5 years ago
@jaypinho Looking at the code you're correct. newspaper looks for the following tags:
VIDEOS_TAGS = ['iframe', 'embed', 'object', 'video']
VIDEO_PROVIDERS = ['youtube', 'vimeo', 'dailymotion', 'kewego']
If @codelucas thinks it's a worthwhile change, we could potentially use something like the following RegEx to try and extract the relevant links:
http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/)([\w\-\_]*)(&(amp;)?[\w\?=]*)?
In the meantime, you could extract the HTML using newspaper and then use the RegEx yourself?
Thanks! I'll try that.
I'm trying the following:
And it's returning an empty array, even though there are multiple YouTube videos linked to in the article.
Is the
movies
function intentionally only for embeds and not for links? If so, is there another way to obtain the list of videos that the article links to?