codeforamerica / classifyr

A tool for aggregating and crowd-sourcing the classification emergency call data
MIT License
0 stars 1 forks source link

As a system, I should be able to automatically classify call types when they match the code or description of a common incident type exactly. #59

Open T-Dnzt opened 2 years ago

T-Dnzt commented 2 years ago

The following is a suggestion / proposal based on the work done recently on classifications.


From doing some classification, I’ve noticed that some of the codes / names are the same in the imported 911 incidents CSV sheet (that I use for testing) and the common incident types table (generated from this file).

It's not true for all rows, but here are some values present in both for example:

I’m guessing there will be some values like this that will be present in all imported files because they use very simple and general terminology. This led me to wonder if we could implement a simple auto-classification system that just does string matching (we could define it as requiring manual review if we want). That would allow instant classification of some data set rows with only the more complex rows left to classify. Because it simply compares strings, it's only a small effort to implement. Note that this can (and probably should) happen in a background job to minimize the impact on the web app.

This could also be the very humble beginning of a more advanced auto-classifying algorithm. The next logical step would be to use past classifications to automatically "guess" how to classify new data (with some still very simple string matching using approved classifications as reference).

The following steps involve more advanced things like machine learning but I don't think we need to worry about that for now.