flennerhag / mlens

ML-Ensemble – high performance ensemble learning
http://ml-ensemble.com
MIT License
843 stars 108 forks source link

Requirements on y dataset #127

Closed samyip123 closed 4 years ago

samyip123 commented 4 years ago

I am trying to fit the ensemble with the following structure with each instance being a ndarray

X_Dataset : ndarray( [ndarray([3 columns],[3 columns],[3 columns]), ndarray([3 columns],[3 columns]),...]) y_Dataset : ndarray( [ndarray([8 columns],[8 columns],[8 columns]), ndarray([8 columns],[8 columns]),...])

Yet when i fit the model, i encountered an error when it tries to sort the y dataset

File "/mlens/parallel/base.py", line 181, in _setup_2multiplier self.classes = y File "/mlens/parallel/base.py", line 202, in classes_ self._classes = np.unique(y).shape[0]

ValueError: operands could not be broadcast together with shapes (3,8) (2,8)

Is there any requirement that mandates the y dataset to be 1-D? thanks

flennerhag commented 4 years ago

If I understand correctly, you are trying to pass a nested set of arrays?

This is not a tested feature, so I can't say for sure what's going on without a minimal example to debut. If you can create a simple toy example that replicates the bug I'll take a look.

One option in the meantime is to simply flatten the data-structure into 2-D arrays and use column selection to get the right data. It's not harder than this:

from mlens.preprocessing import Subset

class CustomSubset:

    def __init__(self, x_cols, y_cols):
        self.x_trans = Subset(x_cols)
        self.y_trans = Subset(y_cols)

    def fit(self, x, y):
        return self

    def transform(self, x, y):
        return self.x_trans(x), self.y_trans(y)

    def fit_transform(self, x, y):
        return self.transform(x, y) 

pipes = {"pipe-1": [CustomSubset([0, 3], [0, 8])], ...} 
ests = {"pipe-1": [est_1, est_2, ...], ...} 
ens.add(estimators=ests, preprocessing=pipes)
flennerhag commented 4 years ago

Close due to inactivity.