A lot of features matrices in practice have small number of non-zero entries per row. E.g. data that come from one-hot encoding have exactly one non-zero entry per row. These can be handled nicely by CategoricalMatrix if all the non-zero entries are one. However, this is not always the case, e.g. data that comes from sklearn.preprocessing.SplineTransformer. These would be nicely supported by ELLPACK format which is a natural generalization of CategoricalMatrix.
Another option is to support Sliced Ellpack (SELL) format which can support general sparse matrix relatively well and make SplitMatrix consists of just a dense matrix and a SELL matrix.
A lot of features matrices in practice have small number of non-zero entries per row. E.g. data that come from one-hot encoding have exactly one non-zero entry per row. These can be handled nicely by
CategoricalMatrix
if all the non-zero entries are one. However, this is not always the case, e.g. data that comes from sklearn.preprocessing.SplineTransformer. These would be nicely supported by ELLPACK format which is a natural generalization ofCategoricalMatrix
.Another option is to support Sliced Ellpack (SELL) format which can support general sparse matrix relatively well and make
SplitMatrix
consists of just a dense matrix and a SELL matrix.