RSGInc / rFirm

firm synthesis
https://rsginc.github.io/rFirm
Other
1 stars 0 forks source link

naics should be factors #14

Closed toliwaga closed 6 years ago

toliwaga commented 6 years ago

Currently all NAICS columns are string type. Performance (and perhaps memory) suffers for very large datasets when the strings (e.g. NAICS) could be factors

For performance, they should be factors. This requires some thought as currently pipeline store doesn't handle factors (just numeric and string types). This is really an activitysim defect, but currently afreight is hurting most from its absence.

There are some issues in the R code where factors need to be rebuilt from strings because (I think) of difference in NAICS flavors (e.g. NAICSio vs NAICS2007) - perhaps we cold create an omnibus factor that includes the intersection of all flavors at startup so we don't need to convert?...

toliwaga commented 6 years ago

I created this issue because I didn't want to forget about it. I am not sure it makes sense to assign it to anyone at this time.

bstabler commented 6 years ago

I'll create an activitysim issue for this - https://github.com/ActivitySim/activitysim/issues/220