As the odo issue https://github.com/blaze/odo/issues/561 mentions, a bottleneck emerged with respect to datashape.Categorical instantiation pointing to this line, where the constructor coerces the input categories into a tuple. I wonder whether we should relax this constraint for Categorical objects, such that we could represent the underlying categories as numpy arrays, i.e. Series.cat.categories.values, and speed up datashape with respect to pandas/dask categorical discovery. cc @jbednar @teoliphant
As the
odo
issue https://github.com/blaze/odo/issues/561 mentions, a bottleneck emerged with respect todatashape.Categorical
instantiation pointing to this line, where the constructor coerces the input categories into a tuple. I wonder whether we should relax this constraint for Categorical objects, such that we could represent the underlying categories as numpy arrays, i.e.Series.cat.categories.values
, and speed updatashape
with respect to pandas/dask categorical discovery. cc @jbednar @teoliphant