Open bencooper222 opened 8 years ago
Adding some further thoughts on this issue and how it can be tackled:
The advanced query consists of some basic parameters:
Natural Language Processing (e.g., OpenNLP) can identify these principal components of the query.
Resulting query: Dataset == 911p AND Community Area == Rogers Park
Though similar to previous example, this can be more complex. Burglaries could correspond to burglaries filed in the Crimes dataset or could be related to 911 calls received about burglaries. We should over-identify
In the absence of specific dates, the application could rely upon our current protocol to displaying a fixed number (e.g., 6,000) of the most recent data points.
Resulting query: Dataset == Crimes AND Dataset == 911p WHERE Primary Description == Burglaries AND Community Area == Rogers Park
Resulting query: Dataset == Twitter AND geoWithin: {center, ([current location])}
Resulting query: Dataset == Twitter WHERE Twitter.text == "Chicago Bulls" AND Date == "2015-05-20"
Resulting query: Dataset == CTA AND geoWithin: {center, (41,8657, -87.7611)}
Testing the NLP feature can be done against the developer API and by referencing the corresponding API docs
While it is certainly possible for Chicago to use OpenNLP to build out it's own full natural language processing systems, I'm not sure if that's wise. With the advent of chatbots, NLP is about to make some huge strides and it looks like progress will mostly be concentrated among Google, Microsoft, Amazon, IBM Watson & Apple. At the moment, Microsoft Cognitive Services and IBM Watson are the most mature and it seems like it would be most wise to use those so you can utilize the progress they will undoubtedly make. Not saying Chicago couldn't make it's own NLP - but it almost certainly couldn't improve it at the same rate as the big cloud providers could.