alexdeleon / lupa

Web scraping java library
MIT License
5 stars 0 forks source link

Add support for oEmbed extractor #1

Open abahgat opened 11 years ago

abahgat commented 11 years ago

Adding oEmbed support would make a lot easier to extract quality previews for URLs that support it.

oEmbed is a format for allowing an embedded representation of a URL on third party sites. The simple API allows a website to display embedded content (such as photos or videos) when a user posts a link to that resource, without having to parse the resource directly.

alexdeleon commented 11 years ago

Thanks for the pointer Alessandro. Definitely an extractor for oEmbed and other metadata structures such as RDFa will make a lot of sense. Now that I know about oEmbed I'm also thinking on the idea of putting up oEmbed endpoint builded with Lupa which can provide metadata about any web resource.

abahgat commented 11 years ago

Sounds like a good idea :+1: I should have some old code lying somewhere with a draft implementation of the extractor, but I need to get it from my old laptop.