Data4Democracy / media-crawler

Web scraper for generating a graph of media connections via articles, twitter, reddit, and more
31 stars 9 forks source link

Handle Tweets in Articles #12

Open josephpd3 opened 7 years ago

josephpd3 commented 7 years ago

Hackathon Note: This is a bit more challenging than the other parsers, and can be considered a stretch goal

So tweets come in a few shapes and sizes when referenced. Sometimes articles link to tweets, other times they embed them. This issue is for tracking both kinds of tweet reference:

No matter which kind you are tackling, I recommend utilizing the Twitter API in this.

Links to Tweets

This can be handled very similarly to all other link references. A parser for twitter URLs will just have to defined and imported into the parser_map under a suitable regex pattern to match against. From there, the Twitter API can likely be your best friend.

Embedded Tweets

This will likely be the first kind of embedded reference we will handle aside from videos, so we will actually discuss that in another issue.