RoyalHaskoningDHV / sam

Python package for time series analysis and machine learning
MIT License
25 stars 6 forks source link

Categorical columns added #79

Closed miguelpher closed 1 year ago

miguelpher commented 1 year ago

Make 'ID'and 'TYPE' columns pd.Categorical instead of str, to reduce the memory spike when using pd.pivot_table in sam_format_to_wide.

According to some tests, using a dataframe of 534.9 MB in sam format, the size is reduced to 352.3 MB when making ID and TYPE categorical. And the memory spike when pivoting that table is reduced by 17% on average. Please, consider that this is an approximated result, due to the difficulty of monitoring memory spikes.

The pandas implementation with categorical variables seems to be more stable, in terms of memory spikes, than alternative implementations in dask or polars