Data4Democracy / media-crawler

Web scraper for generating a graph of media connections via articles, twitter, reddit, and more
31 stars 9 forks source link

Write Item Class for MediaItems #2

Closed josephpd3 closed 6 years ago

josephpd3 commented 6 years ago

The item pipeline for this project will be handling MediaItems. These can be articles, tweets, videos, or even posts on reddit. The item which will encompass these can be very generic and not have all fields be required.

Metadata like tags is often common between articles and videos, though the concept has some overlap with #tags on twitter.

The one large commonality between all of these are the references they make. Articles link to other media items in their body, as do tweets and posts on reddit and other message boards. Videos often have links in their descriptions, and videos embedded from YouTube can be resolved to an ID, which can then be used to grab further data through YouTube--such as these descriptions.

josephpd3 commented 6 years ago

Simple implementation done