darsnack / FluxModels.jl

Standard (pre-trained) ML models written in Flux.jl
MIT License
3 stars 5 forks source link

Resnet model #17

Closed dnabanita7 closed 4 years ago

dnabanita7 commented 4 years ago

@darsnack

ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:5 Got exception outside of a @test UndefVarError: inplanes not defined Stacktrace: [1] resnet(::typeof(FluxModels.VisionModel.basicblock), ::Symbol, ::Array{Int64,1}, ::Array{Int64,1}) at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:37 [2] ResNet18() at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:78 [3] macro expansion at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:13 [inlined] [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined] [5] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6 [6] include(::String) at ./client.jl:439 [7] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:7 [8] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [9] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:6 [10] include(::String) at ./client.jl:439 [11] top-level scope at none:6 [12] eval(::Module, ::Any) at ./boot.jl:331 [13] exec_options(::Base.JLOptions) at ./client.jl:264 [14] _start() at ./client.jl:484


I tried adding the current master version of Flux but it show the error which I have mentioned on discourse. https://discourse.julialang.org/t/flux-master-is-incompatible-with-geometricflux/43404
dnabanita7 commented 4 years ago

alexnet test passed, showing errors for resnet. will fix it.

┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float64
│   T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float64
│   T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float64
│   T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float64
│   T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float64
│   T2 = Float32
└ @ NNlib ~/.julia/packages/NNlib/sSn9M/src/conv.jl:206
ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:5
  Got exception outside of a @test
  UndefVarError: inplanes not defined
  Stacktrace:
   [1] resnet(::typeof(FluxModels.VisionModel.basicblock), ::Symbol, ::Array{Int64,1}, ::Array{Int64,1}) at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:37
   [2] ResNet18() at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:78
   [3] macro expansion at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:13 [inlined]
   [4] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
   [5] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
   [6] include(::String) at ./client.jl:439
   [7] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:7
   [8] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
   [9] top-level scope at /home/nabanita07/FluxModels.jl/test/runtests.jl:6
   [10] include(::String) at ./client.jl:439
   [11] top-level scope at none:6
   [12] eval(::Module, ::Any) at ./boot.jl:331
   [13] exec_options(::Base.JLOptions) at ./client.jl:264
   [14] _start() at ./client.jl:484

Test Summary: | Pass  Error  Total
Vision Models |    1      1      2
  AlexNet     |    1             1
  ResNet      |           1      1
ERROR: LoadError: Some tests did not pass: 1 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nabanita07/FluxModels.jl/test/runtests.jl:5
ERROR: Package FluxModels errored during testing

inplanes is defined and I am not sure why it is throwing up an error.

dnabanita7 commented 4 years ago
ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
  Test threw exception
  Expression: (size(ResNet18(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:7 =# @test((size(ResNet34(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:8 =# @test((size(ResNet50(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:9 =# @test((size(ResNet101(rand(256, 256, 3, 50))) == (1000, 50), #= /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:10 =# @test(size(ResNet152(rand(256, 256, 3, 50))) == (1000, 50)))))))))
  MethodError: no method matching ResNet18(::Array{Float64,4})
  Closest candidates are:
    ResNet18() at /home/nabanita07/FluxModels.jl/src/VisionModel/resnet.jl:78
  Stacktrace:
   [1] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
   [2] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113
   [3] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6

Test Summary: | Pass  Error  Total
Vision Models |    1      1      2
  AlexNet     |    1             1
  ResNet      |           1      1
ERROR: LoadError: Some tests did not pass: 1 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nabanita07/FluxModels.jl/test/runtests.jl:5
ERROR: Package FluxModels errored during testing

This is the final error. I guess something like ResNet18(inplanes, outplanes, ...) might fix this.

dnabanita7 commented 4 years ago

Now, there is a method mismatch error. I haven't changed to Float32 in test/VisionModel/resnet.jl.

ResNet: Error During Test at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:8
  Test threw exception
  Expression: size(m(rand(256, 256, 3, 50))) == (1000, 50)
  MethodError: no method matching (::Chain{Tuple{Conv{2,4,typeof(identity),Array{Float32,4},Array{Float32,1}},BatchNorm{typeof(relu),Array{Float32,1},Array{Float32,1},Float32}}})(::Array{Float64,4}, ::Array{Float64,4})
  Closest candidates are:
    Any(::Any) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:38
  Stacktrace:
   [1] (::SkipConnection)(::Array{Float64,4}) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:259
   [2] applychain(::Tuple{SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,AdaptiveMeanPool{4,2},typeof(flatten),Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}, ::Array{Float64,4}) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:36 (repeats 4 times)
   [3] (::Chain{Tuple{Conv{2,2,typeof(identity),Array{Float32,4},Array{Float32,1}},BatchNorm{typeof(relu),Array{Float32,1},Array{Float32,1},Float32},MaxPool{2,2},SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,SkipConnection,AdaptiveMeanPool{4,2},typeof(flatten),Dense{typeof(identity),Array{Float32,2},Array{Float32,1}}}})(::Array{Float64,4}) at /home/nabanita07/.julia/packages/Flux/IjMZL/src/layers/basic.jl:38
   [4] macro expansion at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:8 [inlined]
   [5] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
   [6] top-level scope at /home/nabanita07/FluxModels.jl/test/VisionModel/resnet.jl:6
darsnack commented 4 years ago

Okay I fixed ResNet18. Can I have write permissions to your fork? It will be easier for me to push a commit and explain.

darsnack commented 4 years ago

Surprisingly, there are no official implementations of Option A from the paper online (source). Whenever the dimensions change, the width/height are halved and the # of channels doubles. For Option B and C, this is not an issue, because the projection shortcut is used. For Option A, this was an issue with our implementation, because even though we use cat to match the channels, the width/height would still be off. The only solution I found (in the link above) is to pool the shortcut input before padding with zeros.

There were some other issues with basic block and bottleneck too. Here's the list of things I changed:

There are still issues with ResNet50+ which use Option B. Instead of fixing those, I am going to explain my debug workflow, and you can try and debug+fix the remaining test failures. I saw you added print statements in resnet which is good, but these will only execute during the model building stage (i.e. when you call model = ResNet18()). They won't tell you the dynamic size information as the input passes through the network. The strategy below is what I used to do that. It isn't the only way to debug in Flux, but it was what was useful for this error.

Pre-requisite: I used Revise.jl to be able to quickly iterate as I fixed errors. You should install Revise.jl in your global environment (not within FluxModels.jl).

  1. First, run ]test FluxModels to see what tests are failing. Pick a specific test case to fix (here I will assume ResNet18 is failing).
  2. Next, open up a fresh Julia REPL in the project root directory with julia --project=..
  3. Start Revise: using Revise, then import FluxModels: using FluxModels.
  4. Instantiate the model to debug: model = ResNet18()
  5. Create a dummy input: x = rand(Float32, 224, 224, 3, 50); (here I used a 224x224 image because that's what's used in the paper, so matching the input sizes allows me to compare against Table 1 in the paper).
  6. Pass the input through the model layer by layer, printing debug info as you go. Here is what I typed into the REPL to do that:
    julia> for (i, layer) in enumerate(model)
       println("Layer $i ($(typeof(layer)))")
       println("insize: $(size(x))")
       global x = layer(x)
       println("outsize: $(size(x))")
       println()
       end

    This prints out debug info like so:

    
    Layer 1 (Conv{2,2,typeof(identity),Array{Float32,4},Array{Float32,1}})
    insize: (224, 224, 3, 50)
    outsize: (112, 112, 64, 50)

Layer 2 (BatchNorm{typeof(relu),Array{Float32,1},Array{Float32,1},Float32}) insize: (112, 112, 64, 50) outsize: (112, 112, 64, 50)

Layer 3 (MaxPool{2,2}) insize: (112, 112, 64, 50) outsize: (56, 56, 64, 50)

Layer 4 (SkipConnection) insize: (56, 56, 64, 50) ERROR: DimensionMismatch("dimensions must match: a has dims (Base.OneTo(27), Base.OneTo(27), Base.OneTo(64), Base.OneTo(50)), b has dims (Base.OneTo(56), Base.OneTo(56), Base.OneTo(64), Base.OneTo(50)), mismatch at 1") Stacktrace: [1] promote_shape at ./indices.jl:178 [inlined] [2] promote_shape(::Array{Float32,4}, ::Array{Float32,4}) at ./indices.jl:169 [3] +(::Array{Float32,4}, ::Array{Float32,4}) at ./arraymath.jl:45 [4] (::SkipConnection)(::Array{Float32,4}) at /Users/darsnack/.julia/packages/Flux/IjMZL/src/layers/basic.jl:259

[6] eval(::Module, ::Any) at ./boot.jl:331 [7] eval_user_input(::Any, ::REPL.REPLBackend) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86 [8] run_backend(::REPL.REPLBackend) at /Users/darsnack/.julia/packages/Revise/tV8FE/src/Revise.jl:1165


7. Fix the first layer whose output isn't right. In my case, I saw that Layer 4 was wrong. I see that the input size to Layer 4 is 56 x 56 x 64 x 50 which is correct according to Table 1. Keep in mind that even though the error may be in Layer 4, it could be because you passed in an input whose size is not correct. (If it is not correct, then double check the previous layers' output sizes to see which prior layer screwed up). From the stack trace, I can see that I am trying to add a 27 x 27 x 64  x 50 array to a 56 x 56 x 64 x 50 (this is `x` and `y` from the `identity` shortcut). I know from Table 1 that `x` should be 28 x 28 x 64 x 50. This led me to the first fix which was adding the correct padding to `basicblock` which controls the size of `x`. I also see that `y` needs to be downsampled, leading me to add the pooling operation to `identity`.
8. Implement the fixes and repeat Steps 4-8 (since you are using Revise, after saving the source file, you should see the changes).
9. Once you successfully get through Steps 4-8 w/o errors, go back to Step 1 and repeat the entire process for the next test that you want to fix.
darsnack commented 4 years ago

Wait so can you see them? They show up for me.

image

dnabanita7 commented 4 years ago

Finally all resnet models are working.

dnabanita7 commented 4 years ago

It can be merged now, right?/

darsnack commented 4 years ago

Yes there is no need for code changes, but I am trying to get the CI to work before merging

dnabanita7 commented 4 years ago

Can I help? and how?

darsnack commented 4 years ago

It's an error with a pointer conversion in LLVM.jl. There's a PR to that repo to fix it, but I'm just ignoring 32-bit machines for now in the CI.