Natural Language Processing Suggestions

Adding some further thoughts on this issue and how it can be tackled:

The advanced query consists of some basic parameters:

Range of dates
Selecting data sources (and criteria for each source)
Select location parameters

Natural Language Processing (e.g., OpenNLP) can identify these principal components of the query.

Example syntax and the resulting queries.

911 calls in Rogers Park

Resulting query: Dataset == 911p AND Community Area == Rogers Park

Burglaries in Rogers Park

Though similar to previous example, this can be more complex. Burglaries could correspond to burglaries filed in the Crimes dataset or could be related to 911 calls received about burglaries. We should over-identify

In the absence of specific dates, the application could rely upon our current protocol to displaying a fixed number (e.g., 6,000) of the most recent data points.

Resulting query: Dataset == Crimes AND Dataset == 911p WHERE Primary Description == Burglaries AND Community Area == Rogers Park

Tweets around me

Resulting query: Dataset == Twitter AND geoWithin: {center, ([current location])}

Tweets about Chicago Bulls on May 20, 2015

Resulting query: Dataset == Twitter WHERE Twitter.text == "Chicago Bulls" AND Date == "2015-05-20"

Buses around City Hall

Resulting query: Dataset == CTA AND geoWithin: {center, (41,8657, -87.7611)}

Developing and testing

Testing the NLP feature can be done against the developer API and by referencing the corresponding API docs

Chicago / opengrid

Natural Language Processing Suggestions #227

Example syntax and the resulting queries.

Developing and testing