Gscorreia89 / pyChemometrics

pyChemometrics - Objects for multivariate analysis of chemometric and metabonomic datasets
BSD 3-Clause "New" or "Revised" License
38 stars 14 forks source link

PLSDA seems to lose "X" at some point #9

Open bpsut opened 1 year ago

bpsut commented 1 year ago

Hello, I just found your repo and was trying to use it to run some cross-validation on multiclass PLS-DA, but was having some issues. I get the following error:

Error --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[100], line 24 22 #clf = PLSRegression() 23 clf = pyChemometrics.ChemometricsPLSDA(ncomps=2) ---> 24 clf.fit(X_train, y_train2) 25 clf.predict(X_train) File ~/mambaforge/envs/pirc/lib/python3.9/site-packages/pyChemometrics/ChemometricsPLSDA.py:180, in ChemometricsPLSDA.fit(self, x, y, **fit_params) 178 if self.n_classes > 2: 179 R2Y = ChemometricsPLS.score(self, x=x, y=dummy_mat, block_to_score='y') --> 180 R2X = ChemometricsPLS.score(self, x=x, y=dummy_mat, block_to_score='x') 181 else: 182 R2Y = ChemometricsPLS.score(self, x=x, y=y, block_to_score='y') File ~/mambaforge/envs/pirc/lib/python3.9/site-packages/pyChemometrics/ChemometricsPLS.py:386, in ChemometricsPLS.score(self, x, y, block_to_score, sample_weight) 384 xscaled = deepcopy(self.x_scaler).fit_transform(x) 385 # Calculate total sum of squares of X and Y for R2X and R2Y calculation --> 386 xpred = self.x_scaler.transform(ChemometricsPLS.predict(self, x=None, y=y)) 387 tssx = np.sum(np.square(xscaled)) 388 rssx = np.sum(np.square(xscaled - xpred)) File ~/mambaforge/envs/pirc/lib/python3.9/site-packages/pyChemometrics/ChemometricsPLS.py:431, in ChemometricsPLS.predict(self, x, y) 428 # Predict X from Y 429 elif y is not None: 430 # Going through calculation of U and then X = Ub_uW' --> 431 u_scores = ChemometricsPLS.transform(self, x=None, y=y) 432 predicted = np.dot(np.dot(u_scores, self.b_u), self.weights_w.T) 433 if predicted.ndim == 1: TypeError: wrapped() missing 1 required positional argument: 'X'

when I try to run the following dummy code:

import pyChemometrics

X_train = np.array([[1,1,1,0,0,0],
                    [1,1,1,0,0,0],
                    [0,0,1,1,0,0],
                    [0,0,1,1,0,0],
                    [0,0,0,0,1,1],
                    [0,0,0,0,1,1],
                    [0,1,0,1,0,1],
                    [0,1,0,1,0,1]])
y_train = np.array([[1,0,0,0],
                    [1,0,0,0],
                    [0,1,0,0],
                    [0,1,0,0],
                    [0,0,1,0],
                    [0,0,1,0],
                    [0,0,0,1],
                    [0,0,0,1]])
y_train2 = np.array([0,0,1,1,2,2,3,3])

#clf = PLSRegression()
clf = pyChemometrics.ChemometricsPLSDA(ncomps=2)
clf.fit(X_train, y_train2)
clf.predict(X_train)

I also noticed that using a one-hot encoded array for y does not seem to work because np.unique() doesn't seem to understand the unique rows. As it stands your code keeps returning that y_train has 2 unique classes.

gettingthestars commented 3 months ago

When I used the PLS, I met the same issue "ChemometricsPLS.transform() missing 1 required positional argument: 'X'."

Gscorreia89 commented 3 months ago

@gettingthestars I pushed a fix for this, let me know if its working for you now. I am also doing some patching for the multi-class setting, but that is not yet fully fixed.