Closed dnabanita7 closed 4 years ago
alexnet test passed, showing errors for resnet. will fix it.
┌ Warning: Slow fallback implementation invoked for conv! You probably don't want this; check your datatypes.
│ yT = Float64
│ T1 = Float64
│ T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv! You probably don't want this; check your datatypes.
│ yT = Float64
│ T1 = Float64
│ T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv! You probably don't want this; check your datatypes.
│ yT = Float64
│ T1 = Float64
│ T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv! You probably don't want this; check your datatypes.
│ yT = Float64
│ T1 = Float64
│ T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv! You probably don't want this; check your datatypes.
│ yT = Float64
│ T1 = Float64
│ T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:5
Got exception outside of a @test
UndefVarError: inplanes not defined
Stacktrace:
[1] resnet(::typeof(FluxModels.VisionModel.basicblock), ::Symbol, ::Array{Int64,1}, ::Array{Int64,1}) at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:37
[2] ResNet18() at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:78
[3] macro expansion at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:13 [inlined]
[4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
[5] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
[6] include(::String) at ./client.jl:439
[7] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:7
[8] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[9] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:6
[10] include(::String) at ./client.jl:439
[11] top-level scope at none:6
[12] eval(::Module, ::Any) at ./boot.jl:331
[13] exec_options(::Base.JLOptions) at ./client.jl:264
[14] _start() at ./client.jl:484
Test Summary: | Pass Error Total
Vision Models | 1 1 2
AlexNet | 1 1
ResNet | 1 1
ERROR: LoadError: Some tests did not pass: 1 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nabanita07/FluxModels.jl/test/runtests.jl:5
ERROR: Package FluxModels errored during testing
inplanes
is defined and I am not sure why it is throwing up an error.
ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
Test threw exception
Expression: (size(ResNet18(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:7 =# @test((size(ResNet34(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:8 =# @test((size(ResNet50(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:9 =# @test((size(ResNet101(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:10 =# @test(size(ResNet152(rand(256, 256, 3, 50))) == (1000, 50)))))))))
MethodError: no method matching ResNet18(::Array{Float64,4})
Closest candidates are:
ResNet18() at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:78
Stacktrace:
[1] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
[2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
[3] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
Test Summary: | Pass Error Total
Vision Models | 1 1 2
AlexNet | 1 1
ResNet | 1 1
ERROR: LoadError: Some tests did not pass: 1 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nabanita07/FluxModels.jl/test/runtests.jl:5
ERROR: Package FluxModels errored during testing
This is the final error. I guess something like ResNet18(inplanes, outplanes, ...)
might fix this.
Now, there is a method mismatch error. I haven't changed to Float32
in test/VisionModel/resnet.jl
.
ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:8
Test threw exception
Expression: size(m(rand(256, 256, 3, 50))) == (1000, 50)
MethodError: no method matching (::Chain{Tuple{Conv{2,4,typeof(identity),Array{Float32,4},Array{Float32,1}},BatchNorm{typeof(relu),Array{Float32,1},Array{Float32,1},Float32}}})(::Array{Float64,4}, ::Array{Float64,4})
Closest candidates are:
Any(::Any) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:38
Stacktrace:
[1] (::SkipConnection)(::Array{Float64,4}) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:259
[2] applychain(::Tuple{SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,AdaptiveMeanPool{4,2},typeof(flatten),Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}, ::Array{Float64,4}) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:36 (repeats 4 times)
[3] (::Chain{Tuple{Conv{2,2,typeof(identity),Array{Float32,4},Array{Float32,1}},BatchNorm{typeof(relu),Array{Float32,1},Array{Float32,1},Float32},MaxPool{2,2},SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,AdaptiveMeanPool{4,2},typeof(flatten),Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}})(::Array{Float64,4}) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:38
[4] macro expansion at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:8 [inlined]
[5] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
[6] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
Okay I fixed ResNet18. Can I have write permissions to your fork? It will be easier for me to push a commit and explain.
Surprisingly, there are no official implementations of Option A from the paper online (source). Whenever the dimensions change, the width/height are halved and the # of channels doubles. For Option B and C, this is not an issue, because the projection shortcut is used. For Option A, this was an issue with our implementation, because even though we use cat
to match the channels, the width/height would still be off. The only solution I found (in the link above) is to pool the shortcut input before padding with zeros.
There were some other issues with basic block and bottleneck too. Here's the list of things I changed:
pad = 1
to basic block so that it preserves the width/heightidentity
to pool to halve width/height then cat to add channelsresnet
to not downsample on the very first set of repeated blocks, since the downsampling is handled by the initial max poolingFloat32
since the fallback CPU conv!
is very slowThere are still issues with ResNet50+ which use Option B. Instead of fixing those, I am going to explain my debug workflow, and you can try and debug+fix the remaining test failures. I saw you added print statements in resnet
which is good, but these will only execute during the model building stage (i.e. when you call model = ResNet18()
). They won't tell you the dynamic size information as the input passes through the network. The strategy below is what I used to do that. It isn't the only way to debug in Flux, but it was what was useful for this error.
Pre-requisite: I used Revise.jl to be able to quickly iterate as I fixed errors. You should install Revise.jl in your global environment (not within FluxModels.jl).
]test FluxModels
to see what tests are failing. Pick a specific test case to fix (here I will assume ResNet18 is failing).julia --project=.
.using Revise
, then import FluxModels: using FluxModels
.model = ResNet18()
x = rand(Float32, 224, 224, 3, 50);
(here I used a 224x224 image because that's what's used in the paper, so matching the input sizes allows me to compare against Table 1 in the paper).julia> for (i, layer) in enumerate(model)
println("Layer $i ($(typeof(layer)))")
println("insize: $(size(x))")
global x = layer(x)
println("outsize: $(size(x))")
println()
end
This prints out debug info like so:
Layer 1 (Conv{2,2,typeof(identity),Array{Float32,4},Array{Float32,1}})
insize: (224, 224, 3, 50)
outsize: (112, 112, 64, 50)
Layer 2 (BatchNorm{typeof(relu),Array{Float32,1},Array{Float32,1},Float32}) insize: (112, 112, 64, 50) outsize: (112, 112, 64, 50)
Layer 3 (MaxPool{2,2}) insize: (112, 112, 64, 50) outsize: (56, 56, 64, 50)
Layer 4 (SkipConnection) insize: (56, 56, 64, 50) ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(27), Base.OneTo(27), Base.OneTo(64), Base.OneTo(50)), b has dims (Base.OneTo(56), Base.OneTo(56), Base.OneTo(64), Base.OneTo(50)), mismatch at 1") Stacktrace: [1] promote_shape at ./indices.jl:178 [inlined] [2] promote_shape(::Array{Float32,4}, ::Array{Float32,4}) at ./indices.jl:169 [3] +(::Array{Float32,4}, ::Array{Float32,4}) at ./arraymath.jl:45 [4] (::SkipConnection)(::Array{Float32,4}) at /Users/darsnack/.julia/packages/Flux/IjMZL/src/layers/basic.jl:259
[6] eval(::Module, ::Any) at ./boot.jl:331 [7] eval_user_input(::Any, ::REPL.REPLBackend) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86 [8] run_backend(::REPL.REPLBackend) at /Users/darsnack/.julia/packages/Revise/tV8FE/src/Revise.jl:1165
7. Fix the first layer whose output isn't right. In my case, I saw that Layer 4 was wrong. I see that the input size to Layer 4 is 56 x 56 x 64 x 50 which is correct according to Table 1. Keep in mind that even though the error may be in Layer 4, it could be because you passed in an input whose size is not correct. (If it is not correct, then double check the previous layers' output sizes to see which prior layer screwed up). From the stack trace, I can see that I am trying to add a 27 x 27 x 64 x 50 array to a 56 x 56 x 64 x 50 (this is `x` and `y` from the `identity` shortcut). I know from Table 1 that `x` should be 28 x 28 x 64 x 50. This led me to the first fix which was adding the correct padding to `basicblock` which controls the size of `x`. I also see that `y` needs to be downsampled, leading me to add the pooling operation to `identity`.
8. Implement the fixes and repeat Steps 4-8 (since you are using Revise, after saving the source file, you should see the changes).
9. Once you successfully get through Steps 4-8 w/o errors, go back to Step 1 and repeat the entire process for the next test that you want to fix.
Wait so can you see them? They show up for me.
Finally all resnet models are working.
It can be merged now, right?/
Yes there is no need for code changes, but I am trying to get the CI to work before merging
Can I help? and how?
It's an error with a pointer conversion in LLVM.jl. There's a PR to that repo to fix it, but I'm just ignoring 32-bit machines for now in the CI.
@darsnack
ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:5 Got exception outside of a @test UndefVarError: inplanes not defined Stacktrace: [1] resnet(::typeof(FluxModels.VisionModel.basicblock), ::Symbol, ::Array{Int64,1}, ::Array{Int64,1}) at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:37 [2] ResNet18() at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:78 [3] macro expansion at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:13 [inlined] [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined] [5] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6 [6] include(::String) at ./client.jl:439 [7] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:7 [8] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [9] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:6 [10] include(::String) at ./client.jl:439 [11] top-level scope at none:6 [12] eval(::Module, ::Any) at ./boot.jl:331 [13] exec_options(::Base.JLOptions) at ./client.jl:264 [14] _start() at ./client.jl:484