FluxML / Optimisers.jl

Optimisers.jl defines many standard optimisers and utilities for learning loops.
https://fluxml.ai/Optimisers.jl
MIT License
74 stars 22 forks source link

Optimiser state a not moving to GPU #179

Open vpuri3 opened 3 weeks ago

vpuri3 commented 3 weeks ago
julia> using CUDA, Optimisers

julia> opt_st = Optimisers.setup(Optimisers.Adam(), rand(2))
Leaf(Adam(0.001, (0.9, 0.999), 1.0e-8), ([0.0, 0.0], [0.0, 0.0], (0.9, 0.999)))

julia> cu(opt_st).state[1]
2-element Vector{Float64}:
 0.0
 0.0

julia> cu(opt_st.state)[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0
(NeuralROMs) pkg> st CUDA
Project NeuralROMs v0.0.1
Status `~/.julia/dev/NeuralROMs.jl/Project.toml`
  [052768ef] CUDA v5.5.2

(NeuralROMs) pkg> st Optimisers
Project NeuralROMs v0.0.1
Status `~/.julia/dev/NeuralROMs.jl/Project.toml`
  [3bd65402] Optimisers v0.3.3
vpuri3 commented 3 weeks ago

can be fixed with

julia> Adapt.@adapt_structure Optimisers.Leaf

julia> cu(opt_st).state[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0
CarloLucibello commented 3 weeks ago

Since Leaf is a functor, it will move to GPU when using Flux.gpu or MLDataDevices.jl.

Also, if the input to setup is on GPU (which tipically is what you want), the state will be on GPU as well.

So maybe there is no need for Adapt?

vpuri3 commented 3 weeks ago

I routinely save the optimizer state to checkpoint files during training. So I need to move em back to the GPU when I restart training.

ToucheSir commented 3 weeks ago

Yes, but that's already doable with Flux.gpu(opt_st) or gdev = MLDataDevices.gpu_device(); gdev(opt_st). Is there a reason you're not able to use either of those libraries and must use CUDA.cu directly?

vpuri3 commented 3 weeks ago

Thanks for the speedy reply @ToucheSir, my use case is below. I am using MLDataDevices.gpu_device something is still going off.

(from https://github.com/FluxML/Optimisers.jl/pull/180#issuecomment-2392223050)

That is true, the MWE works with MLDataDevices. However, we still need Adapt functionality. Consider the case when Leaf is stored as part of a struct. Then using MLDataDevice.gpu_device doesn't move the state to the GPU even if we have Adapt.adapt_structure defined for the object.

using Optimisers, CUDA, LuxCUDA, MLDataDevices, Adapt

struct TrainState{Tp, To}
  p::Tp
  opt_st::To
end

Adapt.@adapt_structure TrainState

p = rand(2)
opt_st = Optimisers.setup(Optimisers.Adam(), p)
ts = TrainState(p, opt_st)
device = gpu_device()
device(ts).opt_st.state[1]

2-element Vector{Float64}:
 0.0
 0.0
ToucheSir commented 3 weeks ago

Per the docs, you need to define @functor or @layer for TrainState to make either function work for it. Happy to take a docs PR which clarifies this.

vpuri3 commented 3 weeks ago

Thanks all, I appreciate your patience in explaining this to me. I directly defined MLDataDevices methods on TrainState.

function (dev::MLDataDevices.CPUDevice)(state::TrainState)
    TrainState(dev(state.p), dev(state.opt_st))
end