Conv layer not loading correctly?

jw3126 commented 3 years ago

I would like to save a Flux model to ONNX and tried this package. However, there seems to be something wrong. Here is an MWE where it seems the weights are not saved correctly. Any ideas?

using Flux
using ONNXNaiveNASflux
import ONNXRunTime

model = Conv((3,), 2=>3, Flux.identity, pad=Flux.SamePad())
model.bias .= 0
model.weight .= 0
model.weight[1,1,1] = 1

x = randn(Float32,3, 2, 1)
path = "NAS.onnx"
ONNXNaiveNASflux.save(path, model, "input" => reverse(size(x)))

y = model(x)
model_reload = ONNXRunTime.load_inference(path)
y_reload = model_reload(Dict("input" => x)) |> values |> only
@show y y_reload;

y = [0.05653854 0.0 0.0; -0.75199324 0.0 0.0; 0.0 0.0 0.0;;;]
y_reload = [0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0;;;]

DrChainsaw commented 3 years ago

Thanks for the report.

I can't reproduce this though.


julia> model = Conv((3,), 2=>3, Flux.identity, pad=Flux.SamePad());

julia> model.bias .= 0;

julia> model.weight .= 0;

julia> model.weight[1,1,1] = 1;

julia> x = randn(Float32,3, 2, 1);

julia> y = model(x)
3×3×1 Array{Float32, 3}:
[:, :, 1] =
 0.181512  0.0  0.0
 0.435586  0.0  0.0
 0.0       0.0  0.0

julia> remodel = load(path);

julia> remodel(x)
3×3×1 Array{Float32, 3}:
[:, :, 1] =
 0.181512  0.0  0.0
 0.435586  0.0  0.0
 0.0       0.0  0.0

(ONNXNaiveNASflux) pkg> activate test
  Activating project at `E:\Programs\julia\.julia\dev\ONNXNaiveNASflux\test`

(test) pkg> instantiate

julia> include("test\\serialize\\onnxruntime.jl")
JuReverseDims (generic function with 3 methods)

julia> onnxruntime_infer(model, x)
┌ Warning: `vendor()` is deprecated, use `BLAS.get_config()` and inspect the output instead
│   caller = npyinitialize() at numpy.jl:67
└ @ PyCall E:\Programs\julia\.julia\packages\PyCall\BD546\src\numpy.jl:67
([0.18151195 0.0 0.0; 0.43558574 0.0 0.0; 0.0 0.0 0.0;;;],)

Perhaps I'm using a too old version of onnxruntime in my tests? I would love to swap it for ONNXRuntime.jl when it has windows support btw to get rid of the python verionhandling through PyCall and Conda.

I have only tested the above on windows, but testcases run things which are very similar as the above (e.g. here) so unless it is some edge case it should have caused them to fail.

I have also checked the exported model with netron and it seems to pick up the weights too.

jw3126 commented 3 years ago

Ok thanks a lot. Then the bug is probably with ONNXRunTime. Though I also have similar test cases. I will investigate.

jw3126 commented 3 years ago

Ok I think I know what is going on. Your onnxruntime_infer interprets size(x) = (3,2,1) as batch=1, channel=2, x=3. ONNXRunTime OTOH interprets it as batch=3, channel=2, x=1. So for ONNXRunTime the only non-zero weight element is paired with a padded zero. If we take these layout issues into account, it works:

using Revise
using Flux
using ONNXNaiveNASflux
import ONNXRunTime

model = Conv((3,), 2=>3, Flux.identity, pad=1, bias=false)
model.weight .= 0
model.weight[3,1,1] = 1

x = randn(Float32,3, 2, 1)
path = "NAS.onnx"
ONNXNaiveNASflux.save(path, model, "input" => (:a,2,:b))

y = model(x)
model_reload = ONNXRunTime.load_inference(path)
y_reload = model_reload(Dict("input" => permutedims(x,(3,2,1)))) |> values |> only
@show y y_reload;

y = [0.0 0.0 0.0; -0.12048469 0.0 0.0; 1.0404693 0.0 0.0;;;]
y_reload = [0.0 0.0 0.0;;; -0.12048469 0.0 0.0;;; 1.0404693 0.0 0.0]

Concerning windows support. I don't have a windows machine available, so I was not motivated to implement it. But since windows binaries are officially provided, it should not be hard.

DrChainsaw commented 3 years ago

Yeah, I tried to keep all sizes Flux compliant so that users don't need to think about the ONNX representation. It's not onnxruntime_infer which flips the sizes, but ONNXNaiveNASflux itself. Iow, the input size (:a, 2, :b) gives the input size (:b, 2, :a) in the ONNX file.

I see how this could cause confusion for someone who comes at it from the onnx perspective, but saying "add the expected input shape of model" seems more user friendly than "add the expected input shape of the ONNX representation of model".

Your example accidentally worked due to the odd size where the only constrained size did not move, or else I think ONNXNaiveNASflux would throw an error.

DrChainsaw / ONNXNaiveNASflux.jl

Conv layer not loading correctly? #61