codeforamerica / classifyr

A tool for aggregating and crowd-sourcing the classification emergency call data
MIT License
0 stars 1 forks source link

As a data classifier, I would like the transition from one call type to another to be more performant #113

Open jamesiarmes opened 2 years ago

jamesiarmes commented 2 years ago

When a call type classification page is loaded, the data set file appears to be downloaded each time. There is likely some additional parsing that happens at this time as well. We should look into this and determine if this needs to be done and if so how we can address the performance impact.

This has presented itself in at least two data sets:

jamesiarmes commented 2 years ago

After looking into this a bit, I found that the classifyr does a "lazy" generation of the examples for each unique value. This isn't a problem for small data sets, but larger data sets take considerably longer to generate these examples.

We can look at improving the performance of generating these examples, but I also recommend backgrounding this process and generating the full set of examples for a data set before classification begins.