Open gortibaldik opened 1 year ago
Obviously I added wrong code for training net, where I didn't use gpu_
data and network for the training. Now the issue contains the right code.
The error is because conv
is getting a mix of CuArray and Array input:
[5] conv_im2col!(
y::SubArray{Float32, 5, CUDA.CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false},
x::SubArray{Float32, 5, CUDA.CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false},
w::SubArray{Float32, 5, Array{Float32, 5}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true},
cdims::NNlib.DenseConvDims{3, 3, 3, 6, 3}
)
And the reason for that is that Loss(X, y) = crossentropy(net(X), y)
closes over net
, not gpu_net
.
The fact that train!
expects two different references to the model's parameters (via loss and params) is a weird feature of this "implicit" interface. We're trying to kill it... you have Flux v0.13.9 which already supports the new way, https://fluxml.ai/Flux.jl/previews/PR2114/training/training/ is roughly the upgrade guide.
That guide is a bit too condensed for me, and I'm not too sure I catched the main gist.
Now I understand, that under the "implicit style" I should also define gpu_loss(X, y) = crossentropy(gpu_net(X), y)
What is recommended under the "explicit style" ?
Do I understand correctly, that the train_model
function in my code should be rewritten in this way?
function train_model!(net, X, y;
loss=crossentropy,
opt = Descent(0.1),
batchsize=128,
n_epochs=10,
file_name=""
)
batches = DataLoader((X, y); batchsize, shuffle=true)
opt_state = Flux.setup(opt, net)
for current_epoch in 1:n_epochs
Flux.train!(net, batches, opt_state) do m, x, y
loss(m(x), y)
end
end
# save the model
!isempty(file_name) && BSON.bson(file_name, net=net)
end
Is there another best-practice to which I do not adhere ? Thank you :+1:
Yes that looks right to me. But I didn't run it, I hope it works!
I have problems with ScalarIndexing when training neural network transfered to gpu.
Julia Version
Flux Version
As I do not know how to create more minimal bug example, I will show how the error occurs on MNIST dataset:
Minimal Bug Example
I create a net as here:
function define_net() net = Chain( Conv((2, 2), 1=>16, relu), flatten, Dense(11664, size(y_train, 1)), softmax, ) end
Reshape it into size (w, h, c, N), normalize
I use this function for training:
and here is the invocation:
I have absolutely no problems when running this code when I do not use gpu
so let's now try to use the gpu
Output: