denizyuret / Knet.jl

Koç University deep learning framework.
https://denizyuret.github.io/Knet.jl/latest
Other
1.43k stars 230 forks source link

Element-wise power of KnetArray give NaN results #108

Closed ognkrmms closed 5 years ago

ognkrmms commented 7 years ago

Element-wise powers of KnetArrays produce NaN for most of the entries but it sometimes calculates some of the entries correct.

kirnap commented 7 years ago

I've also encountered the same problem and here is how to replicate the error in the simplest manner:

`x = KnetArray{Float32}(randn(4,1))
 julia> x.*x
4×1 Knet.KnetArray{Float32,2}:
 1.13435
 0.0826074
 0.018514
 0.00399359

julia> x.^2
4×1 Knet.KnetArray{Float32,2}:
   1.13435
 NaN
   0.018514
   0.00399359
`
denizyuret commented 7 years ago

There seems to be a problem with powers of Float32 negative numbers. There is no issue with Float64 or positive numbers. Is this also your experience?

On Wed, Apr 5, 2017 at 8:53 PM Ömer KIRNAP notifications@github.com wrote:

I've also encountered the same problem and here is how to replicate the error in the simplest manner: `x = KnetArray{Float32}(randn(4,1)) julia> x.*x 4×1 Knet.KnetArray{Float32,2}: 1.13435 0.0826074 0.018514 0.00399359

julia> x.^2 4×1 Knet.KnetArray{Float32,2}: 1.13435 NaN 0.018514 0.00399359 `

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/108#issuecomment-291942916, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNptn-ugbfhBOc0nqRoWtFg16Qeiipks5rs9ULgaJpZM4M0iQt .

ilkerkesen commented 7 years ago

I am able to repeat it with Float64 arrays also.

https://gist.github.com/ilkerkesen/6a6859781402fe4a38b8a0923ba64f64

denizyuret commented 7 years ago

I don't get this on the aitest machine. Is it an NVIDIA library version problem? I have:

julia> include(Knet.dir("test/gpu.jl"))
(8000,8000,8000,5105)

Or maybe chip problem:

$ nvidia-smi
NVIDIA-SMI 375.39                 Driver Version: 375.39
Tesla K80
ognkrmms commented 7 years ago

I have no problem with Float64 or positive numbers on my laptop. I have

julia> include(Knet.dir("test/gpu.jl"))
(8000,8000,8000,5110)

and

$ nvidia-smi
NVIDIA-SMI 375.39                 Driver Version: 375.39
Geforce GT 650M
denizyuret commented 7 years ago

Fractional powers of negative numbers are not always defined (e.g. -1^0.5 = i ) in real domain, the answers are complex. So I am guessing that some of the NVIDIA libraries just return NaN without checking if there is a real valued answer for negative inputs. We'll investigate which versions have this problem. But in the meantime it is safest not to use the power operator with negative inputs.

On Thu, Apr 6, 2017 at 12:21 PM ognkrmms notifications@github.com wrote:

I have no problem with Float64 or positive numbers on my laptop. I have

julia> include(Knet.dir("test/gpu.jl")) (8000,8000,8000,5110)

and

$ nvidia-smi NVIDIA-SMI 375.39 Driver Version: 375.39 Geforce GT 650M

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/108#issuecomment-292117450, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpiXyy2u4WBKiW-KppjoC4eFQYShoks5rtK6tgaJpZM4M0iQt .

CarloLucibello commented 6 years ago

this could be a solution

julia> k=KnetArray(randn(Float32,2,2))
2×2 Knet.KnetArray{Float32,2}:
 1.74326  -2.52771
 2.1774    0.47341

julia> k.^2
2×2 Knet.KnetArray{Float32,2}:
 3.03896  NaN32       
 4.74106      0.224117

julia> Base.power_by_squaring.(k,2)
2×2 Knet.KnetArray{Float32,2}:
 3.03896  6.3893  
 4.74106  0.224117

I have to figure out how to redirect broadcast though. Can we reopen the issue?

denglerchr commented 5 years ago

Still have this issue in Knet 1.2.1. abs2.(M) works though, but .^( M , 2) is still bugged for me. I think using power on a KnetArray{Float32} should atleast produce a warning for the meantime.

ekinakyurek commented 5 years ago

@denizyuret this issue still exists in 1.2.6

julia> x = randn(Float32,30,30)
30×30 Array{Float32,2}:
  0.293437   -0.334544  -1.38307    …   0.545329  -0.67475
 -0.814309    0.382297   1.50798       -1.18123    0.560698
 -1.15237     1.49037   -0.77491        0.112766   1.29614
  3.31963    -0.580821   1.60538       -0.671189  -0.164358
  0.0430724   0.766548  -0.0157219      0.264342   0.883471
  ⋮                                 ⋱
  0.892913   -2.26454   -1.78504        0.377429  -0.697448
 -0.704658   -0.1836    -1.75147        0.181553  -0.069336
 -1.03845     0.576976  -0.786031       0.092297  -1.31554
 -2.0941      3.4429     1.44819        0.753965   0.516067

julia> ka = KnetArray(x)
30×30 KnetArray{Float32,2}:
  0.293437   -0.334544  -1.38307    …   0.545329  -0.67475
 -0.814309    0.382297   1.50798       -1.18123    0.560698
 -1.15237     1.49037   -0.77491        0.112766   1.29614
  3.31963    -0.580821   1.60538       -0.671189  -0.164358
  0.0430724   0.766548  -0.0157219      0.264342   0.883471
  ⋮                                 ⋱
  0.892913   -2.26454   -1.78504        0.377429  -0.697448
 -0.704658   -0.1836    -1.75147        0.181553  -0.069336
 -1.03845     0.576976  -0.786031       0.092297  -1.31554
 -2.0941      3.4429     1.44819        0.753965   0.516067

julia> x.^3
30×30 Array{Float32,2}:
  0.0252664    -0.0374419   …   0.162172     -0.307206
 -0.539967      0.055873       -1.64818       0.176274
 -1.5303        3.31039         0.00143395    2.17751
 36.5823       -0.195941       -0.302367     -0.00443993
  7.99092e-5    0.45042         0.0184714     0.689568
  ⋮                         ⋱
  0.711915    -11.6129          0.0537658    -0.339263
 -0.349892     -0.00618898      0.00598424   -0.000333331
 -1.11983       0.192076        0.000786254  -2.27674
 -9.18312      40.8107          0.428602      0.137441

julia> ka.^3
30×30 KnetArray{Float32,2}:
   0.0252664   NaN         …    0.162172     NaN
 NaN             0.055873     NaN              0.176274
 NaN             3.31039        0.00143395     2.17751
  36.5823      NaN            NaN            NaN
   7.99092e-5    0.45042        0.0184714      0.689568
   ⋮                       ⋱
   0.711915    NaN              0.0537658    NaN
 NaN           NaN              0.00598424   NaN
 NaN             0.192076       0.000786255  NaN
 NaN            40.8107         0.428602       0.137441
denizyuret commented 5 years ago

This is a cudnn problem. They cannot handle powers of negative numbers. Do you want a solution only for positive integer powers? If so, we can use @CarloLucibello's power_by_squaring solution by (1) writing a cuda kernel for its broadcast application, or (2) see if it works out of the box for CuArrays and use that (like we did for permutedims).

ekinakyurek commented 5 years ago

I believe it works for CuArrays since this is used in gelu layer in Flux.

On Sep 26, 2019, at 2:10 PM, denizyuret notifications@github.com wrote:

This is a cudnn problem. They cannot handle powers of negative numbers. Do you want a solution only for positive integer powers? If so, we can use @CarloLucibello https://github.com/CarloLucibello's power_by_squaring solution by (1) writing a cuda kernel for its broadcast application, or (2) see if it works out of the box for CuArrays and use that (like we did for permutedims).

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/108?email_source=notifications&email_token=ADVGX4MAED4UQ27AURDKOOLQLT3IHA5CNFSM4DGSEQW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WPKKI#issuecomment-535622953, or mute the thread https://github.com/notifications/unsubscribe-auth/ADVGX4O37WFUQ6CVEQMGEBDQLT3IHANCNFSM4DGSEQWQ.

denizyuret commented 5 years ago

@maleadt there is a cuda math bug where pow(x,i) works correctly for double values but incorrectly for negative float32 values (gives Nan even when i is an integer). Does CuArrays also call the CUDA pow function or do you have power by squaring or something?

denizyuret commented 5 years ago

498 fixes this using CuArrays, please test.

CarloLucibello commented 5 years ago

Knet v1.2.7 fixes the problem for me, I guess we can close this