denizyuret / Knet.jl

Koç University deep learning framework.
https://denizyuret.github.io/Knet.jl/latest
Other
1.43k stars 230 forks source link

cudnn error #25

Closed ppalmes closed 8 years ago

ppalmes commented 8 years ago

Running lenet.jl on Power machine:

in macro expansion at /localhome/paulito/.julia/v0.5/Knet/src/gpu.jl:10 [inlined] in (::Knet.##call#117#119)(::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Type{T}, :: Knet.KnetArray{Float32,4}) at /localhome/paulito/.julia/v0.5/Knet/src/cuda44.jl:110 in Knet.PD(::Knet.KnetArray{Float32,4}) at /localhome/paulito/.julia/v0.5/Knet/src/cu da44.jl:107 in macro expansion at /localhome/paulito/.julia/v0.5/Knet/src/gpu.jl:7 [inlined] in #pool#92(::Ptr{Void}, ::Float32, ::Float32, ::Array{Any,1}, ::Function, ::Knet.Kne tArray{Float32,4}) at /localhome/paulito/.julia/v0.5/Knet/src/cuda44.jl:40 in pool(::Knet.KnetArray{Float32,4}) at /localhome/paulito/.julia/v0.5/Knet/src/cuda4 4.jl:39 in predict(::Array{Any,1}, ::Knet.KnetArray{Float32,4}) at /localhome/paulito/.julia/ v0.5/Knet/examples/lenet.jl:83 in #accuracy#8(::Int64, ::Function, ::Array{Any,1}, ::Array{Any,1}) at /localhome/pau lito/.julia/v0.5/Knet/examples/lenet.jl:122 in main(::String) at /localhome/paulito/.julia/v0.5/Knet/examples/lenet.jl:53 in eval(::Module, ::Any) at ./boot.jl:234 in eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:64 in macro expansion at ./REPL.jl:95 [inlined] in (::Base.REPL.##3#4{Base.REPL.REPLBackend})() at ./event.jl:68WARNING: cudnn.cudnnP oolingForward error 3

in macro expansion at /localhome/paulito/.julia/v0.5/Knet/src/gpu.jl:10 [inlined] in #conv4#66(::Ptr{Void}, ::Float32, ::Float32, ::Int64, ::Ptr{Void}, ::Int64, ::Arra y{Any,1}, ::Function, ::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,4}) at /lo calhome/paulito/.julia/v0.5/Knet/src/cuda44.jl:7 in conv4(::Knet.KnetArray{Float32,4}, ::Knet.KnetArray{Float32,4}) at /localhome/paul ito/.julia/v0.5/Knet/src/cuda44.jl:6 in predict(::Array{Any,1}, ::Knet.KnetArray{Float32,4}) at /localhome/paulito/.julia/ v0.5/Knet/examples/lenet.jl:83 in #accuracy#8(::Int64, ::Function, ::Array{Any,1}, ::Array{Any,1}) at /localhome/pau lito/.julia/v0.5/Knet/examples/lenet.jl:122 in main(::String) at /localhome/paulito/.julia/v0.5/Knet/examples/lenet.jl:53 in eval(::Module, ::Any) at ./boot.jl:234 in eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:64 in macro expansion at ./REPL.jl:95 [inlined] in (::Base.REPL.##3#4{Base.REPL.REPLBackend})() at ./event.jl:68WARNING: cudnn.cudnnS etPoolingNdDescriptor error 3

ppalmes commented 8 years ago

charlm.jl, housing.jl, linreg.jl, mnist.jl worked except in lenet.jl which uses convolution and pooling.

denizyuret commented 8 years ago

Can you find out the version of cudart and cudnn you are using:

julia> using Knet
INFO: Knet using GPU 0

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1]
8000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,()))
5103
ppalmes commented 8 years ago

Hi,

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 7000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 3000

On Thu, Oct 20, 2016 at 10:22 AM, denizyuret notifications@github.com wrote:

Can you find out the version of cudart and cudnn you are using:

julia> using Knet INFO: Knet using GPU 0

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 8000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 5103

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-255054511, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQDrmTXKYnnJuCyNxGePXZSYrhFvkd8ks5q1zLdgaJpZM4KYwY9 .

ppalmes commented 8 years ago

for another power machine:

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 7050

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 4007

On Thu, Oct 20, 2016 at 3:30 PM, Paulito Palmes ppalmes@gmail.com wrote:

Hi,

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 7000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 3000

On Thu, Oct 20, 2016 at 10:22 AM, denizyuret notifications@github.com wrote:

Can you find out the version of cudart and cudnn you are using:

julia> using Knet INFO: Knet using GPU 0

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 8000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 5103

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-255054511, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQDrmTXKYnnJuCyNxGePXZSYrhFvkd8ks5q1zLdgaJpZM4KYwY9 .

denizyuret commented 8 years ago

Same error in both?

On Thu, Oct 20, 2016 at 5:32 PM Paulito Palmes notifications@github.com wrote:

for another power machine:

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 7050

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 4007

On Thu, Oct 20, 2016 at 3:30 PM, Paulito Palmes ppalmes@gmail.com wrote:

Hi,

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 7000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 3000

On Thu, Oct 20, 2016 at 10:22 AM, denizyuret notifications@github.com wrote:

Can you find out the version of cudart and cudnn you are using:

julia> using Knet INFO: Knet using GPU 0

julia> p=Cint[0]; ccall(("cudaRuntimeGetVersion","libcudart"),UInt32,(Ptr{Cint},),p); p[1] 8000

julia> Int(ccall(("cudnnGetVersion","libcudnn"),Csize_t,())) 5103

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-255054511 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ABQDrmTXKYnnJuCyNxGePXZSYrhFvkd8ks5q1zLdgaJpZM4KYwY9

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-255122875, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpuT1WuVAAmhv5GbIrenZ7pgGbSqRks5q13t5gaJpZM4KYwY9 .

ppalmes commented 8 years ago

yes. same error. all using power machines.

denizyuret commented 8 years ago

OK, I can replicate the problem with old versions of the cudnn library. I will try to add support for the old cudnn interface. In the meantime recent versions of cudnn (v5) should work without issues.

ppalmes commented 8 years ago

ok, i can ask the system admin to maybe upgrade if cudnn later version is available in power.

On Thu, Oct 20, 2016 at 3:51 PM, denizyuret notifications@github.com wrote:

OK, I can replicate the problem with old versions of the cudnn library. I will try to add support for the old cudnn interface. In the meantime recent versions of cudnn (v5) should work without issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-255129046, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQDrs0qdrrrrw_boHAbm6fIae0w7199ks5q14AMgaJpZM4KYwY9 .

denizyuret commented 8 years ago

added support for earlier cudnn library interfaces, please test.

ppalmes commented 8 years ago

thanks. if you can support the older cudnn it will be great because i'm not sure if the latest cudnn is available in the power machines.

Sent from my iPhone

On 20 Oct 2016, at 15:51, denizyuret notifications@github.com wrote:

OK, I can replicate the problem with old versions of the cudnn library. I will try to add support for the old cudnn interface. In the meantime recent versions of cudnn (v5) should work without issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

denizyuret commented 8 years ago

Hi Paulito,

The latest master supports all of them.

Try Pkg.update() and Pkg.checkout("Knet") On Tue, Oct 25, 2016 at 9:14 PM Paulito Palmes notifications@github.com wrote:

thanks. if you can support the older cudnn it will be great because i'm not sure if the latest cudnn is available in the power machines.

Sent from my iPhone

On 20 Oct 2016, at 15:51, denizyuret notifications@github.com wrote:

OK, I can replicate the problem with old versions of the cudnn library. I will try to add support for the old cudnn interface. In the meantime recent versions of cudnn (v5) should work without issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-256117142, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpmvYKLGcSr4tGL3VsIFKHD2sBSexks5q3kcVgaJpZM4KYwY9 .

ppalmes commented 8 years ago

Hi,

I just checked and Knet now works on one of the machines but fails on the other one. I checked the cuda version of the successful one and it's using cuda-7.5 while the one not running is using cuda-7.0. I'll just ask the system admin to update to the latest cuda I think because cuda-7.0 is quiet old.

Thanks!

On Thu, Oct 20, 2016 at 6:57 PM, Paulito ppalmes@gmail.com wrote:

thanks. if you can support the older cudnn it will be great because i'm not sure if the latest cudnn is available in the power machines.

Sent from my iPhone

On 20 Oct 2016, at 15:51, denizyuret notifications@github.com wrote:

OK, I can replicate the problem with old versions of the cudnn library. I will try to add support for the old cudnn interface. In the meantime recent versions of cudnn (v5) should work without issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/25#issuecomment-255129046, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQDrs0qdrrrrw_boHAbm6fIae0w7199ks5q14AMgaJpZM4KYwY9 .