Imagine you are a new reporter just assigned to a beat. Or a community activist interested in researching a certain political figure or government agency. News archives have lots of information that can help bring you up to speed. But reading the thousands (or millions) of news articles returned from a search enginge takes lots and lots of time. Rookie is designed to help.
Abe Handler did most of the research and coding for Rookie. He got lots of conceptual help from Steve Myers and Brendan O'Connor.
This project began at The Lens with support from the Knight foundation.
You will need a copy of Brendan O'Connor's wrapper for StanfordCore NLP to index Rookie documents.
facets
the facet engine
rookie_ui
React UI for rookie. use gulp b to push js to /webapp
webapp
The Rookie webapp
To import a new corpus you need to have a user=rookie and database=rookie on a local postgres install w/ default ports.
You will need a file: corpora/[corpus]/raw/all.extract
which is a tsv, where [corpus] is a corpus name like hhaiti
.
The format of the tsv is indexed at 0.
YYYYMMDD_0000000
where the 0s dont matter. Then run getting_and_processing_corpora/load_corpus.sh [corpus]
. The import process requires python2 and a bunch of old dependencies specified in requirements.txt. At some point this might be updated, maybe.