Infinite Values for norm2

ilyakorsunsky commented 6 years ago

Hi, for my larger datasets (250,000 x 2000) run with the R code (fastpath=FALSE), I run into the problem that some of the data structures (e.g. V) get so large that the L2 norm (norm2) gets infinite. Then I get errors comparing R and S and eps2, because R or S are infinite. I fixed this problem by scaling by the max of the vector before doing the L2 scaling (example below). Then the code runs to completion. However, I get really large results (e.g. max d is 1e150), which don't match the C implementation. I suspect these large values are themselves the result of a bug.

  V[, 1] <- max_scale(V[, 1])
  V[, 1] <- V[, 1] / norm2(V[, 1])

This may be a related issue: when I ran the C version (fastpath=TRUE) yesterday on the same data, I got the error message "BLAS/LAPACK routine 'DLASCL' gave error code -4". It seems that this error arises when there are NA or INF values in the original matrix. I wonder if this error can also arise from INF values of L2 norm computation. Strangely, I run the same thing today and don't get this error, so if this is not an issue others have, please ignore.

Thanks for looking into this!

bwlewis commented 6 years ago

thankd for this... will investigate.

On Fri, Apr 27, 2018, 18:07 ilyakorsunsky notifications@github.com wrote:

Hi, for my larger datasets (250,000 x 2000) run with the R code (fastpath=FALSE), I run into the problem that some of the data structures (e.g. V) get so large that the L2 norm (norm2) gets infinite. Then I get errors comparing R and S and eps2, because R or S are infinite. I fixed this problem by scaling by the max of the vector before doing the L2 scaling (example below). Then the code runs to completion. However, I get really large results (e.g. max d is 1e150), which don't match the C implementation. I suspect these large values are themselves the result of a bug.

V[, 1] <- max_scale(V[, 1]) V[, 1] <- V[, 1] / norm2(V[, 1])

This may be a related issue: when I ran the C version (fastpath=TRUE) yesterday on the same data, I got the error message "BLAS/LAPACK routine 'DLASCL' gave error code -4". It seems that this error arises when there are NA or INF values in the original matrix. I wonder if this error can also arise from INF values of L2 norm computation. Strangely, I run the same thing today and don't get this error, so if this is not an issue others have, please ignore.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bwlewis/irlba/issues/35, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIsng7zROois-xSOZXUPGz95SmGUnHEks5ts5asgaJpZM4TrE2C .

bwlewis commented 5 years ago

Yes indeed, I can replicate these behaviors with badly scaled data due to floating point overflow. For example:

x = rep(sqrt(.Machine$double.xmax) * 10, 2)
# now its 2-norm:
sqrt(drop(crossprod(x)))
[1] Inf

however I have not been able to cook up a toy example that illustrates significant differences between the R and C code paths yet.

In any case, I don't yet have a great solution. Am open to ideas!

bwlewis / irlba

Infinite Values for norm2 #35