Open vpuri3 opened 3 weeks ago
can be fixed with
julia> Adapt.@adapt_structure Optimisers.Leaf
julia> cu(opt_st).state[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
0.0
0.0
Since Leaf is a functor, it will move to GPU when using Flux.gpu or MLDataDevices.jl.
Also, if the input to setup is on GPU (which tipically is what you want), the state will be on GPU as well.
So maybe there is no need for Adapt?
I routinely save the optimizer state to checkpoint files during training. So I need to move em back to the GPU when I restart training.
Yes, but that's already doable with Flux.gpu(opt_st)
or gdev = MLDataDevices.gpu_device(); gdev(opt_st)
. Is there a reason you're not able to use either of those libraries and must use CUDA.cu
directly?
Thanks for the speedy reply @ToucheSir, my use case is below. I am using MLDataDevices.gpu_device
something is still going off.
(from https://github.com/FluxML/Optimisers.jl/pull/180#issuecomment-2392223050)
That is true, the MWE works with MLDataDevices. However, we still need Adapt functionality. Consider the case when Leaf is stored as part of a struct. Then using MLDataDevice.gpu_device doesn't move the state to the GPU even if we have Adapt.adapt_structure defined for the object.
using Optimisers, CUDA, LuxCUDA, MLDataDevices, Adapt
struct TrainState{Tp, To}
p::Tp
opt_st::To
end
Adapt.@adapt_structure TrainState
p = rand(2)
opt_st = Optimisers.setup(Optimisers.Adam(), p)
ts = TrainState(p, opt_st)
device = gpu_device()
device(ts).opt_st.state[1]
2-element Vector{Float64}:
0.0
0.0
Per the docs, you need to define @functor
or @layer
for TrainState
to make either function work for it. Happy to take a docs PR which clarifies this.
Thanks all, I appreciate your patience in explaining this to me. I directly defined MLDataDevices
methods on TrainState
.
function (dev::MLDataDevices.CPUDevice)(state::TrainState)
TrainState(dev(state.p), dev(state.opt_st))
end