UDST / synthpop

Synthetic populations from census data
BSD 3-Clause "New" or "Revised" License
99 stars 47 forks source link

use the category type for performance wins #30

Closed Eh2406 closed 7 years ago

Eh2406 commented 7 years ago

So just came across Categorical Data in pandas, and this blog post on how it dramatically improves performance on data with text categories.

This is not yet tested, but I think it makes sense. What do you think? Are there other places that need it?

Eh2406 commented 7 years ago

travis is complaining because diff has been removed.

Our test show that this works but does not make a huge difference.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 81.257% when pulling 5295cf039960e239d509cdc9e5b5f23c88cb1fb1 on SEMCOG:use_category into 6d2f3dc1b25a85404760a001c8a927ddf00199e1 on UDST:master.

janowicz commented 7 years ago

Good to know about the category data type. I can definitely see this being useful. Does this particular change still make sense even though there's not much performance difference?

Eh2406 commented 7 years ago

Sorry, just got back from a trip.

I don't think it matters much, but I think it makes sense the point of the function is to make categories that may as well be "Categorical."