FluxML / FluxTraining.jl

A flexible neural net training library inspired by fast.ai
https://fluxml.ai/FluxTraining.jl
MIT License
117 stars 25 forks source link

Quickstart tutorial broken #124

Open KronosTheLate opened 2 years ago

KronosTheLate commented 2 years ago

The example Training an image classifier currently uses the following code:

xs, ys = (
    # convert each image into h*w*1 array of floats 
    [Float32.(reshape(img, 28, 28, 1)) for img in Flux.Data.MNIST.images()],
    # one-hot encode the labels
    [Float32.(Flux.onehot(y, 0:9)) for y in Flux.Data.MNIST.labels()],
)

However,

(Project) pkg> st Flux
      Status `C:\Users\Dennis Bal\ProjectFolder\Project.toml`
  [587475ba] Flux v0.13.0

julia> using Flux

julia> Flux.Data.MNIST
ERROR: UndefVarError: MNIST not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base .\Base.jl:35
 [2] top-level scope
   @ REPL[16]:1

So the example is broken. As a side note, I think the example would do great by using MLUtils instead of DataLoaders.jl and MLDataPattern. Also, Flux imports DataLoader so no need to explicitly import it.

But I take a look at the docs and try to get started. So I make the following code, that works with Flux's base capacities:

julia> using Flux

julia> using Flux: onehotbatch, onecold

julia> using FluxTraining

julia> using MLUtils: flatten, unsqueeze

julia> using MLDatasets

julia> labels = 0:9
0:9

julia> traindata = MNIST.traindata(Float32) |> x->(unsqueeze(x[1], 3), onehotbatch(x[2], labels));

julia> size.(traindata)
((28, 28, 1, 60000), (10, 60000))

julia> trainloader = DataLoader(traindata, batchsize=128);

julia> validdata = MNIST.testdata(Float32) |> x->(unsqueeze(x[1], 3), onehotbatch(x[2], labels)); 

julia> size.(validdata)
((28, 28, 1, 10000), (10, 10000))

julia> validloader = DataLoader(validdata, batchsize=128);

julia> predict = Chain(flatten, Dense(28^2, 10))
Chain(
  MLUtils.flatten,
  Dense(784 => 10),                     # 7_850 parameters
)

julia> lossfunc(x, y) = Flux.Losses.logitbinarycrossentropy(predict(x), y)
lossfunc (generic function with 1 method)

julia> optimizer=ADAM()
ADAM(0.001, (0.9, 0.999), 1.0e-8, IdDict{Any, Any}())

julia> callbacks = [Metrics(accuracy)]
1-element Vector{Metrics}:
 Metrics(Loss(), Metric(Accuracy))

julia> learner = Learner(predict, lossfunc; optimizer, callbacks)
Learner()

At this point, I start checking loss and training with Flux's train!:

julia> lossfunc(validdata...)
0.7624986f0

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(validdata...)
0.11266354f0

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(validdata...)
0.08880948f0

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(validdata...)
0.0801171f0

Training no problem. However, when I try to train my learner, it seems like a single float is passed to predict, and not an array:

julia> fit!(learner, 1, (traindata, validdata))
Epoch 1 TrainingPhase() ...
ERROR: MethodError: no method matching flatten(::Float32)
Closest candidates are:
  flatten(::AbstractArray) at C:\Users\usrname\.julia\packages\MLUtils\QTRw7\src\utils.jl:424  
Stacktrace:
  [1] macro expansion
    @ C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [inlined]     
  [2] _pullback(ctx::Zygote.Context, f::typeof(flatten), args::Float32)
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:9        
  [3] macro expansion
    @ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
  [4] _pullback
    @ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
  [5] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}, ::Float32)
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
  [6] _pullback
    @ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:51 [inlined]
  [7] _pullback(ctx::Zygote.Context, f::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, args::Float32)
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
  [8] _pullback
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:54 [inlined]
  [9] _pullback(ctx::Zygote.Context, f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, args::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [10] _pullback
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [inlined]
 [11] _pullback(::Zygote.Context, ::FluxTraining.var"#73#74"{FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [12] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:352       
 [13] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:75        
 [14] _gradient(f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, #unused#::ADAM, m::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:70      
 [15] (::FluxTraining.var"#69#71"{Learner})(handle::FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, state::FluxTraining.PropDict{Any})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:53      
 [16] runstep(stepfn::FluxTraining.var"#69#71"{Learner}, learner::Learner, phase::TrainingPhase, initialstate::NamedTuple{(:xs, :ys), Tuple{Float32, Float32}})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:133     
 [17] step!
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:51 [inlined]
 [18] (::FluxTraining.var"#67#68"{Learner, TrainingPhase, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})(#unused#::Function)
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:24      
 [19] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}}, learner::Learner, phase::TrainingPhase)     
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:105     
 [20] epoch!
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:22 [inlined]
 [21] fit!(learner::Learner, nepochs::Int64, ::Tuple{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:168     
 [22] top-level scope
    @ REPL[51]:1

I am completely stuck as to what goes wrong. Pointers in that regard would be appreciated, but the main issue is making the example functional, and updating the packages used to load data and the utility functions that I take from MLUtils.

To improve the reliability of this package, could doc testing be used to ensure that in the future, the documentation examples actually run?

lorenzoh commented 2 years ago

Thanks for catching this and taking the time to write up a solution! That example is outdated and hasn't been touched in a while.

So the example is broken.

Flux.jl removed the included datasets a while ago, so this needs to be updated to use MLDatasets.jl.

I think the example would do great by using MLUtils instead of DataLoaders.jl and MLDataPattern. Also, Flux imports DataLoader so no need to explicitly import it.

I agree and am working on deprecating DataLoaders.jl since I've added all its functionality to MLUtils.jl. While FluxTraining.jl is agnostic of the kind of data iterator, I agree it would be good to standardize on MLUtils.jl.

However, when I try to train my learner, it seems like a single float is passed to predict, and not an array:

The setup you wrote is correct, but you then passed train_data to fit! instead of train_loader. fit! expects an iterator over batches so the latter would be correct. Passing the complete train_data which is a single array, means that the training loop will iterate over the array, thus getting a single float at every iteration.

To improve the reliability of this package, could doc testing be used to ensure that in the future, the documentation examples actually run?

This is done for the majority of code in the docs, but not for this page. The reason is that the CI does not have a GPU available to train the model, which is why the source for this page is an executed Jupyter notebook. The problem with this is that it has to be manually rerun.

lorenzoh commented 2 years ago

If you'd like to contribute your proposed changes as a PR, I'd be happy to review it!

It would boil down to:

KronosTheLate commented 2 years ago

I did not realize that you were the one behind MLUtils xD Feeling a little dumb for telling you about it now.

However, your proposed fix still does not work for me. The line that defines lossfunc turns red for me, indicating that it is the where things go wrong, even though I could not see it in the stacktrace:

Code and stacktrace ``` julia> using Flux julia> using Flux: onehotbatch, onecold julia> using FluxTraining julia> using MLUtils: flatten, unsqueeze julia> using MLDatasets julia> labels = 0:9; julia> function add_channeldim_and_onehotencode_targets(data_and_targets::Tuple) data = data_and_targets[1] targets = data_and_targets[2] return unsqueeze(data, 3), onehotbatch(targets, labels) end add_channeldim_and_onehotencode_targets (generic function with 1 method) julia> #traindata and testdata contain both inputs (pixel values) and targets (correct labels) julia> traindata = MNIST.traindata(Float32) |> add_channeldim_and_onehotencode_targets; julia> trainloader = DataLoader(traindata, batchsize=128); julia> testdata = MNIST.testdata(Float32) |> add_channeldim_and_onehotencode_targets; julia> testloader = DataLoader(testdata, batchsize=128); julia> predict = Chain(flatten, Dense(28^2, 10)); julia> lossfunc(x, y) = Flux.Losses.logitbinarycrossentropy(predict(x), y) lossfunc (generic function with 1 method) julia> optimizer = ADAM(); julia> callbacks = [Metrics(accuracy)]; julia> learner = Learner(predict, lossfunc; optimizer, callbacks) Learner() julia> lossfunc(testdata...) 0.7482163f0 julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer) julia> lossfunc(testdata...) 0.111782126f0 julia> FluxTraining.fit!(learner, 1, (trainloader, testloader)) Epoch 1 TrainingPhase() ... ERROR: DimensionMismatch("A has dimensions (10,784) but B has dimensions (10,128)") Stacktrace: [1] gemm_wrapper!(C::Matrix{Float32}, tA::Char, tB::Char, A::Matrix{Float32}, B::Matrix{Float32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool}) @ LinearAlgebra C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:643 [2] mul! @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:169 [inlined] [3] mul! @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:275 [inlined] [4] * @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:160 [inlined] [5] rrule @ C:\Users\Dennis Bal\.julia\packages\ChainRules\5p7j5\src\rulesets\Base\arraymath.jl:60 [inlined] [6] rrule @ C:\Users\Dennis Bal\.julia\packages\ChainRulesCore\RbX5a\src\rules.jl:134 [inlined] [7] chain_rrule @ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\chainrules.jl:217 [inlined] [8] macro expansion @ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [inlined] [9] _pullback @ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:9 [inlined] [10] _pullback @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:160 [inlined] [11] macro expansion @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined] [12] _pullback @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined] [13] _pullback @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:51 [inlined] [14] _pullback(ctx::Zygote.Context, f::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, args::Matrix{Float32}) @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [15] _pullback @ .\REPL[123]:1 [inlined] [16] _pullback(::Zygote.Context, ::typeof(lossfunc), ::Matrix{Float32}, ::Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}) @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [17] _pullback @ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:56 [inlined] [18] _pullback(ctx::Zygote.Context, f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, args::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}) @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [19] _pullback @ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [inlined] [20] _pullback(::Zygote.Context, ::FluxTraining.var"#73#74"{FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}}) @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [21] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}}) @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:352 [22] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}}) @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:75 [23] _gradient(f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, #unused#::ADAM, m::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}}) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [24] (::FluxTraining.var"#69#71"{Learner})(handle::FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, state::FluxTraining.PropDict{Any}) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:53 [25] runstep(stepfn::FluxTraining.var"#69#71"{Learner}, learner::Learner, phase::TrainingPhase, initialstate::NamedTuple{(:xs, :ys), Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}}) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:133 [26] step!(learner::Learner, phase::TrainingPhase, batch::Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:51 [27] (::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}})(#unused#::Function) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:24 [28] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}}, learner::Learner, phase::TrainingPhase) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:105 [29] epoch! @ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:22 [inlined] [30] fit!(learner::Learner, nepochs::Int64, ::Tuple{DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}}) @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:168 [31] top-level scope @ REPL[130]:1 ```
lorenzoh commented 2 years ago

Unlike Flux.jl, FluxTraining.jl separates the loss function and the model, so that model outputs can be stored. That means that the loss function going into the Learner should be just the Flux loss function, i.e. Learner(predict, Flux.Losses.logitcrossentropy).

Then it works fine:

image

I made some minor adjustments to the data loading code as well, the full code is here:

using Flux
using Flux: onehotbatch, onecold
using FluxTraining
using MLUtils: flatten, unsqueeze
using MLDatasets

const LABELS = 0:9

function preprocess((data, targets))
    return unsqueeze(data, 3), onehotbatch(targets, LABELS)
end

traindata = MNIST(Float32, :train)[:] |> preprocess
testdata = MNIST(Float32, :test)[:] |> preprocess

trainloader = DataLoader(traindata, batchsize=128);
testloader = DataLoader(testdata, batchsize=128);

model = Chain(flatten, Dense(28^2, 10));

lossfn = Flux.Losses.logitcrossentropy

optimizer = ADAM()
callbacks = [Metrics(accuracy)]

learner = Learner(predict, lossfn; optimizer, callbacks)

FluxTraining.fit!(learner, 1, (trainloader, testloader))
KronosTheLate commented 2 years ago

That makes sense. I always found it weird that I had to define the loss function in terms of the model. Also, big fan of the changes you made. It looks great.

I do have my hands full, but as some point I will find time to update the docs. But I really think that at the very least a functional MWE should be put in the docs before I will find the time to go down your checklist. If nothing else, a temporary warning and a link to this issue would do the trick.

KronosTheLate commented 2 years ago

Getting the following error from you code:

julia> using MLDatasets

julia> traindata = MNIST(Float32, :train)[:] |> preprocess
ERROR: MethodError: objects of type Module are not callable
Stacktrace:
 [1] top-level scope
   @ REPL[135]:1

I will just stick with traindata = MNIST.traindata(Float32) |> preprocess

lorenzoh commented 2 years ago

I think your version of MLDatasets.jl is not up-to-date, so you may wanna update to 0.7.1, i.e. ]add MLDatasets@0.7.1

KronosTheLate commented 2 years ago

You are right! However, with the latest version the output is of type MNIST<:MLDatasets.SupervisedDataset, not a tuple. The following works:

function preprocess(data::MNIST)
    return unsqueeze(data.features, 3), onehotbatch(data.targets, LABELS)
end

# traindata and testdata contain both inputs (pixel values) and targets (correct labels)
traindata = MNIST(Float32, :train) |> preprocess;
testdata = MNIST(Float32, :test) |> preprocess;