FluxML / model-zoo

Please do not feed the models
https://fluxml.ai/
Other
908 stars 333 forks source link

Problem with broadcasting .== in MNIST Conv model from the zoo #102

Open kool7d opened 5 years ago

kool7d commented 5 years ago

I'm getting a GPU compilation error with the conv model from the zoo:



Argument 4 to your kernel function is of type [something long that I can't copy/paste]
That type is not isbits, and such arguments are only allowed when they are unused by the kernel```
This is at the last line, `Flux.train!(loss, params(m), train, opt, cb = evalcb)`. Everything works on cpu.

Edit:
I've narrowed it down to the line `accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))`. If I remove the callback, `evalcb = throttle(() -> @show(accuracy(tX, tY)), 10)` -> `Flux.train!(loss, params(m), train, opt)` it works.
Edit:
Further narrowed it down to the `.==` in `onecold(m(tX)) .== onecold(tY)`
`mean(cpu(onecold(m(tX))) .== cpu(onecold(tY)))` works
DhairyaLGandhi commented 5 years ago

Hi, could you share the the output of pkg> st?

Conversely, are you using the cuda environment in that folder?

DhairyaLGandhi commented 5 years ago

Thanks for pointing it out!

kool7d commented 5 years ago

Status C:\Users\kool7\.julia\environments\v1.1\Project.toml [c52e3926] Atom v0.7.14 [de9282ab] BioStructures v0.4.0 [be33ccc6] CUDAnative v1.0.1 [3a865a2d] CuArrays v0.9.1 [b4f34e82] Distances v0.7.4 [587475ba] Flux v0.7.2 [e5e0dc1b] Juno v0.5.4 [929cbde3] LLVM v1.0.0 [50d2b5c4] Lazy v0.13.2 [1914dd2f] MacroTools v0.4.4 [438e738f] PyCall v1.18.5 [8ba89e20] Distributed [10745b16] Statistics

What do you mean using the cuda in this folder?

DhairyaLGandhi commented 5 years ago

The environment in /vision/mnist/cuda/ should work. It's a known issue in that specific version of Flux, dropping down to 0.7.1 should do the trick.

Moelf commented 5 years ago

pin CuArrays to 0.9.1 and Flux to 0.7.1 results the following

[ Info: Constructing model...
[ Info: Building the CUDAnative run-time library for your sm_61 device, this might take a while...
[ Info: Beginning training loop...
┌ Warning: `∇conv_data(dy::A, x::A, w::A; kw...) where A <: AbstractArray` is deprecated, use `∇conv_data(dy, w; size=size(x), kw...)` instead.
│   caller = ip:0x0
└ @ Core :-1
ERROR: LoadError: conversion to pointer not defined for CuArray{Float32,4}

And it seems like downgrading Flux alone is okay despite the warning:

┌ Warning: Flux is only supported with CuArrays v0.9.
│ Try running `] pin CuArrays@0.9`.
tbenst commented 5 years ago

I have the same issue, see additional info in #125