Chicago / opengrid

A user-friendly, map-based tool to combine and explore real-time or historical data.
http://opengrid.io
Other
247 stars 53 forks source link

Natural Language Processing Suggestions #227

Open bencooper222 opened 7 years ago

bencooper222 commented 7 years ago

While it is certainly possible for Chicago to use OpenNLP to build out it's own full natural language processing systems, I'm not sure if that's wise. With the advent of chatbots, NLP is about to make some huge strides and it looks like progress will mostly be concentrated among Google, Microsoft, Amazon, IBM Watson & Apple. At the moment, Microsoft Cognitive Services and IBM Watson are the most mature and it seems like it would be most wise to use those so you can utilize the progress they will undoubtedly make. Not saying Chicago couldn't make it's own NLP - but it almost certainly couldn't improve it at the same rate as the big cloud providers could.

tomschenkjr commented 7 years ago

Adding some further thoughts on this issue and how it can be tackled:

The advanced query consists of some basic parameters:

Natural Language Processing (e.g., OpenNLP) can identify these principal components of the query.

Example syntax and the resulting queries.

Resulting query: Dataset == 911p AND Community Area == Rogers Park

Though similar to previous example, this can be more complex. Burglaries could correspond to burglaries filed in the Crimes dataset or could be related to 911 calls received about burglaries. We should over-identify

In the absence of specific dates, the application could rely upon our current protocol to displaying a fixed number (e.g., 6,000) of the most recent data points.

Resulting query: Dataset == Crimes AND Dataset == 911p WHERE Primary Description == Burglaries AND Community Area == Rogers Park

Resulting query: Dataset == Twitter AND geoWithin: {center, ([current location])}

Resulting query: Dataset == Twitter WHERE Twitter.text == "Chicago Bulls" AND Date == "2015-05-20"

Resulting query: Dataset == CTA AND geoWithin: {center, (41,8657, -87.7611)}

Developing and testing

Testing the NLP feature can be done against the developer API and by referencing the corresponding API docs