Closed mantouRobot closed 5 years ago
You can rewrite np.dot(X - lda.xbar, lda.scalings)
as np.dot(X, lda.scalings) - np.dot(lda.xbar, lda.scalings)
and the second term is a constant value. Thus we only shift the results here, which is later compensated when we later compute the q-values.
Dear @uweschmitt ,
Thanks for your reply. But I do not understand this shift with be compensated later is meaningful. Why not directly use the raw transform function as the score?
Thanks.
We did not use the transformation for historical reasons. The predecessor mProphet did not subtract the mean and we wanted reproducible results when we rewrote mProphet in Python. Finally the q-value is invariant in regard of this difference. The compensation happens in the pnorm
function.
@uweschmitt In pnorm
function, what's pyprophet does is subtracting the mean of decoy peaks. But lda.xbar
means the mean of all samples. That means it doesn't make sense mathmatically.
Exactly. You subtract the mean. As I posted before your modification introduces a constant shift for all scores. So what happens if you shift a set of scores by a constant value and later substract the mean? You get the same values as without shifting.
You can also try this out: implement your modification and compare new vs old results.
Dear @uweschmitt , you are right. I try the modification and the result are the same. Now I understand that the score's absolute value is not the key point because we will convert the score to pvalue based on the distrubution of scores. In this way, the entire scores shift will not make difference to the result.
Thanks again.
Dear @grosenberger , @uweschmitt , @hroest ,
After LDA fitted the train data, we need to score the test data using the LDA model params. As I know, there are two methods to calculate the scores:
"LinearDiscriminantAnalysis().transform()". This function transforms the features to the new small subspace. In fact, it scores like this: np.dot(X - lda.xbar, lda.scalings)
LinearDiscriminantAnalysis().predict(). In detail, this function determines the classification based on: np.dot(X, lda.coef.T) + lda.intercept
reference can be found here
But in the file of 'classifiers.py', the function of 'score()' is just _'clfscores = np.dot(X, lda.scalings)' . Incomprehensibly, in the function of '_start_semi_supervisedlearning', the _clf_scores -= np.mean(clf_scores)_ (the mean of clf_scores is not alway zero?), but in the function of '_iter_semi_supervisedlearning', the clf_scores does not minus the mean of itself.
In conclusion, I doubt the score formulation's correctness used by pyprophet. Is it should like this np.dot(X - lda.xbar, lda.scalings) instend of np.dot(X, lda.scalings)? Maybe these two methods don't make much difference to the final result in the end.
Thanks.