dougnukem / livecoding-projects

Projects building while streaming on https://www.livecoding.tv/dougnukem/
Apache License 2.0
0 stars 1 forks source link

Webpipe idea #10

Open dougnukem opened 8 years ago

dougnukem commented 8 years ago

Webpipes

Webpage API Pipes It'd be cool to be able to cobble together APIs by scraping the content on webpages.

For example scraping the chat channel from twitch.tv or livecoding.tv

Having javascript scrape the DOM for new messages and fire events to something that's listening for new messages.

The more "correct" way would be to integrate into the IRC or XMPP services that twitch.tv and livecoding.tv is using, and maybe that's correct.

I feel like people could come up with interesting "one-liner" bash type "commands" for scraping web services they use.

This probably wouldn't be useful at scale because it's a bit too ham-fisted, but it might be "scalable" to be run on a single user's machine (and it'd be easier to authenticate/automate access if we're using a "real" browser and credentials)

I think something like PhantomJS would be a good start:

A dashboard could be built around PhantomJS where users would select "recipes" that basically are PhantomJS automations for scraping data from a site. Then they could "pipe" that data to another service (webhooks, IPC, write to a file, datastore or SQS).

This would probably be pretty brittle because website DOM content is subject to obfuscation/updating the site etc.

It could be fun to cobble together mashups or widgets, basically make your webpages and services scriptable and composable like unix pipeline commands.

Maybe for more stable/sophisticated things the scraping style could be migrated to using official APIs (like Facebook/Twitter Twitch), but it'd be nice to have some quick and dirty way to just suck data out, and when new features are launched it might be quicker to get something up and running using PhantomJS, or some type of visual recorder that generates PhantomJS commands to be massaged into data.