LAAC-LSCP / ChildProject

Python package for the management of day-long recordings of children.
https://childproject.readthedocs.io
MIT License
13 stars 5 forks source link

Investigate the opportunity of using Pandas categoricals #259

Open lucasgautheron opened 3 years ago

lucasgautheron commented 3 years ago

Is your feature request related to a problem? Please describe.

Our annotations usually contain a lot of text that is actually categorical (e.g. speaker type, etc.) We might save memory usage and some CPU time by using pandas Categorical data type.

There are, however, a number of pitfalls. So we should be very careful with this. This may be something we could make an option...

Describe the solution you'd like

Assess the impact of Categorical on the performance on realistic data.