Quantco / tabmat

Efficient matrix representations for working with tabular data
https://tabmat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
117 stars 6 forks source link

Have a list of basic operations that are guaranteed to be supported for all MatrixBase #105

Closed MarcAntoineSchmidtQC closed 7 months ago

MarcAntoineSchmidtQC commented 3 years ago

A cleanup would probably help us identify low-hanging fruits.

For instance, should we define __matmul__ for all the classes? Should we define the more basic __mul__ and __rmul__?

I would also like to standardize all our __repr__ methods so that a user knows quickly that these objects are coming from the same package.

tbenthompson commented 3 years ago

Yes, I strongly approve of having a clearly documented API that is consistent between the various MatrixBase classes! Thanks for pointing this out, Marc.

Part of the task here seems like a question of where to enumerate this. In the docstrings of the MatrixBase class? Somewhere else?

lbittarello commented 3 years ago

Should we define the more basic __mul__ and __rmul__?

Yes, please! With a consistent API across subtypes. At the moment, * performs element-wise multiplication for dense matrices (NumPy style), matrix multiplication for sparse matrices (SciPy) and failed multiplication for categorical matrices (my own).

tbenthompson commented 3 years ago

Marc and I talked a couple days ago about some indexing issues where the capabilities of DenseMatrix greatly exceeded that of other matrix types because the underlying numpy arrays have a lot of features. It seems like this will be the case for many operations. Would it make sense to explicitly forbid a lot of DenseMatrix operations? With an error message something like "That operation is not supported by quantcore.matrix. Please access the underlying numpy array if you want to perform advanced numpy operations." That would dramatically reduce our API surface area in some places where I'd prefer not to have to implement tons of stuff for e.g. CategoricalMatrix and SplitMatrix.

lbittarello commented 3 years ago

To my mind, it's okay if one subtype has a fuller API than the others. I only mentioned __mul__ (and indexing elsewhere) because it seems pretty elementary to me. It comes up naturally if you want to, say, create interactions between columns of a SplitMatrix.

tbenthompson commented 3 years ago

I think we should distinguish between a fuller API and an inconsistent API. Fuller is fine but inconsistent is not. Does that seem reasonable? And like the title of this issue is saying, we should also identify the minimal API.

Just wanted to summarize. I don't think this is disagreeing with any of the above posts. =)

MarcAntoineSchmidtQC commented 1 year ago

Going through old issues and found this one. This issue will be solved by #286.