Quantco / tabmat

Efficient matrix representations for working with tabular data
https://tabmat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
109 stars 5 forks source link

Create CatMatrix from codes and categories #389

Open MarcAntoineSchmidtQC opened 6 days ago

MarcAntoineSchmidtQC commented 6 days ago

See this issue on Glum to understand the reasoning.

Checklist

MarcAntoineSchmidtQC commented 6 days ago

@lbittarello, if you have time I would love to get your feedback. It removes the _Categorical class that you added during the polars PR.

MarcAntoineSchmidtQC commented 5 days ago

If we deprecate the cat property, I don't think we should be thinking about expanding its scope. Polars support is not released yet, so in the stable version cat is a pandas.Categorical. I would be happy to simply return a pandas.Categorical when possible and otherwise raise an error saying that this method is not supported with a non-pandas backend. This is fully backward compatible.

MarcAntoineSchmidtQC commented 5 days ago

Of course, I can add a more obvious error message instead of relying on python spitting out an error because it doesn't know what pd is.

stanmart commented 5 days ago

Would adding an _input_dtype argument to CategoricalMatrix's constructor solve this? __getitem__ sees self and can pass this property to the newly created matrix.

lbittarello commented 5 days ago

I would be happy to simply return a pandas.Categorical when possible and otherwise raise an error saying that this method is not supported with a non-pandas backend.

That's fine by me. :)