Closed billautomata closed 7 years ago
OC = original content, I can answer any questions about my collection method but it boils down to wget on a cronjob on a few linux boxes I have in the cloud, and a script I run manually that asserts the data is formatted like I expect and it fills a mongo database with the parent data elements to what I export in flat tsv
files.
https://billautomata.github.io/drudgereport_report/ - here is the tool I built w/ the data.
Thanks for your note. I'm going to close this pull request however, since this is a repo of the datasets behind our stories.
I'm working on a personal project that uses this data, but y'all should get access to it in the meantime. I can make json versions of this. I'm also scraping all the images and pairing them with stories, but that data set is for another time.
I love the podcast. Please do more of them.
Harambe / Habish 4 president.