Closed veni-vidi-vici-dormivi closed 1 month ago
Attention: Patch coverage is 0%
with 2 lines
in your changes are missing coverage. Please review.
Project coverage is 81.94%. Comparing base (
9b0b76b
) to head (0bfe836
). Report is 37 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
mesmer/mesmer_m/power_transformer.py | 0.00% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Can you define
eps = np.spacing(np.float64(1))
or actually, if we don't do it for an array
eps = np.finfo(np.float64).eps
? That would be clearer to me (not sure if here or in another PR, though).
I think another PR would make sense.
Then let's merge this then?
Can you merge main - I need to think about this.
The question is, if when lambda is exactly eps, if we consider that to be zero, or not. I just wanted to make it consistent with the sklearn function. They however are not consistent themselves I think because they write if abs(lmbda) < np.spacing(1.0)
for $\lambda = 0$ but then if not abs(lmbda - 2) > np.spacing(1.0)
for $\lambda = 2$...
# when x >= 0
if abs(lmbda) < np.spacing(1.0):
out[pos] = np.log1p(x[pos])
else: # lmbda != 0
out[pos] = (np.power(x[pos] + 1, lmbda) - 1) / lmbda
# when x < 0
if abs(lmbda - 2) > np.spacing(1.0):
out[~pos] = -(np.power(-x[~pos] + 1, 2 - lmbda) - 1) / (2 - lmbda)
else: # lmbda == 2
out[~pos] = -np.log1p(-x[~pos])
Maybe that's the reason why Shruti wrote it herself? Because for the inverse transform she actually just uses the once from sklearn...
Shruti wrote this herself so she could have variable (or dependent) $\lambda$ values.
I think the >=
/ >
mess comes because the original author also confused that $\lambda$ can be any real value and does not have to be in 0..2
(also https://github.com/MESMER-group/mesmer/pull/430#discussion_r1590938808). The value problem was fixed but this particular inconsistency remained - see scikit-learn/scikit-learn#12522 (Assuming 0 <= lambda <= 2
the operators make sense.)
scipy does the same but I think it's written by the same author: https://github.com/scipy/scipy/blob/7dcd8c59933524986923cde8e9126f5fc2e6b30b/scipy/stats/_morestats.py#L1572
Shruti wrote this herself so she could have variable (or dependent) values.
Hm, I mean we could also pass every (value, lambda) pair to the sklearn power transform no? Like we do for the inverse transform.
$\lambda$ can be any real value and does not have to be in 0..2
In our case it is because lambda is derived from a logistic function.
I don't get how it makes sense that in one case $\lambda == eps$ is not contained in the case and in the other it is?
Shruti wrote this herself so she could have variable (or dependent) values.
Hm, I mean we could also pass every (value, lambda) pair to the sklearn power transform no? Like we do for the inverse transform.
Yes, but then we have to check if this is vectorized - otherwise it will be too slow.
λ can be any real value and does not have to be in 0..2
In our case it is because lambda is derived from a logistic function.
Ah ok, sorry - it's too many open PRs and comments. But then this is a function of the covariate function and it's not optimal if this is in
(Technically the user will not be able to easily replace the lambda_function
but it's misleading when the bounds are associated with the Yeo-Johnson transform and not with the covariate function. And almost impossible to find out if the lambda_function
should ever be changed...)
I agree, it is pretty hard to see through it all. The get_yeo_johnson_lambdas
is pretty horrible. I think we could get rid of it when rewriting the whole thing for xarrays. Maybe we should think about passing the lambda function as argument...
But in this PR I actually just wanted to fix the operator, to be the same as in sklearn. Am I missing something?
But in this PR I actually just wanted to fix the operator, to be the same as in sklearn. Am I missing something?
No - I was trying to understand why it's inconsistent in scikit-learn (and I think I do now).
Changing the comparison to eps in
_yeo_johnson_transform
to be consistent with sklearn's yet-johnsons power transform.