CliMA / ClimaCore.jl

CliMA model dycore
https://clima.github.io/ClimaCore.jl/dev
Apache License 2.0
87 stars 8 forks source link

adapt for spaces/fields is broken #2091

Open haakon-e opened 22 hours ago

haakon-e commented 22 hours ago

Describe the bug

adapting a (center)space leads to malformed struct, which is realized during printing of resulting object.

As a consequence, the space cannot be used to set up a field correctly.

error message ```julia julia> Fields.Adapt.adapt(array_type, center_space) CenterExtrudedFiniteDifferenceSpace: context: Error showing value of type ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{ClimaCore.Grids.DeviceExtrudedFiniteDifferenceGrid{ClimaCore.Topologies.DeviceIntervalTopology{@NamedTuple{bottom::Int64, top::Int64}}, ClimaCore.Quadratures.GLL{4}, ClimaCore.Geometry.CartesianGlobalGeometry, ClimaCore.DataLayouts.VIJFH{ClimaCore.Geometry.LocalGeometry{(1, 2, 3), ClimaCore.Geometry.XYZPoint{Float64}, Float64, StaticArraysCore.SMatrix{3, 3, Float64, 9}}, 82, 4, CUDA.CuArray{Float64, 5, CUDA.DeviceMemory}}, ClimaCore.DataLayouts.VIJFH{ClimaCore.Geometry.LocalGeometry{(1, 2, 3), ClimaCore.Geometry.XYZPoint{Float64}, Float64, StaticArraysCore.SMatrix{3, 3, Float64, 9}}, 83, 4, CUDA.CuArray{Float64, 5, CUDA.DeviceMemory}}}, ClimaCore.Grids.CellCenter}: ERROR: type DeviceExtrudedFiniteDifferenceGrid has no field horizontal_grid Stacktrace: [1] getproperty @ ./Base.jl:37 [inlined] [2] horizontal_space(full_space::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}) @ ClimaCore.Spaces ~/.julia/packages/ClimaCore/rSpyb/src/Spaces/extruded.jl:151 [3] show(io::IOContext{…}, space::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}) @ ClimaCore.Spaces ~/.julia/packages/ClimaCore/rSpyb/src/Spaces/extruded.jl:136 [4] show(io::IOContext{…}, ::MIME{…}, x::ClimaCore.Spaces.ExtrudedFiniteDifferenceSpace{…}) @ Base.Multimedia ./multimedia.jl:47 [5] (::OhMyREPL.var"#7#8"{REPL.REPLDisplay{…}, MIME{…}, Base.RefValue{…}})(io::IOContext{Base.TTY}) @ OhMyREPL ~/.julia/packages/OhMyREPL/bkUhZ/src/output_prompt_overwrite.jl:23 [6] with_repl_linfo(f::Any, repl::REPL.LineEditREPL) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569 [7] display @ ~/.julia/packages/OhMyREPL/bkUhZ/src/output_prompt_overwrite.jl:6 [inlined] [8] display @ /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:278 [inlined] [9] display(x::Any) @ Base.Multimedia ./multimedia.jl:340 [10] #invokelatest#2 @ ./essentials.jl:892 [inlined] [11] invokelatest @ ./essentials.jl:889 [inlined] [12] print_response(errio::IO, response::Any, show_value::Bool, have_color::Bool, specialdisplay::Union{…}) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:315 [13] (::REPL.var"#57#58"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:284 [14] with_repl_linfo(f::Any, repl::REPL.LineEditREPL) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:569 [15] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:282 [16] (::REPL.var"#do_respond#80"{…})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:911 [17] #invokelatest#2 @ ./essentials.jl:892 [inlined] [18] invokelatest @ ./essentials.jl:889 [inlined] [19] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState) @ REPL.LineEdit /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/LineEdit.jl:2656 [20] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef) @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:1312 [21] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})() @ REPL /clima/software/julia/julia-1.10.5/share/julia/stdlib/v1.10/REPL/src/REPL.jl:386 Some type information was truncated. Use `show(err)` to see complete types. ```

To Reproduce

  1. allocate GPU on clima cluster: srun -G1 --pty bash -l
  2. in ClimaAtmos.jl root (main branch, julia v1.10): julia --proj=examples
  3. Run these julia commands:
    
    import CUDA
    import ClimaComms
    import ClimaAtmos.InitialConditions as ICs
    import ClimaAtmos as CA
    import ClimaCore: ClimaCore, Fields
    import YAML
    comms_ctx = ClimaComms.SingletonCommsContext(ClimaComms.CUDADevice())
    ClimaComms.init(comms_ctx)
    config_dict = YAML.load_file("config/model_configs/diagnostic_edmfx_trmm_box.yml")
    config = CA.AtmosConfig(config_dict; job_id="TRMM", comms_ctx)
    # simulation_init = CA.get_simulation(config)  # fails due to `KernelError: passing and using non-bitstype argument`

dig into get_simulation:

params = CA.create_parameter_set(config) spaces = CA.get_spaces(config.parsed_args, params, config.comms_ctx) center_space = spaces.center_space; array_type = ClimaComms.array_type(ClimaComms.device(center_space))

adapting a grid works: gpu -> cpu -> gpu

grid = ClimaCore.Spaces.grid(center_space); Fields.Adapt.adapt(array_type, Fields.Adapt.adapt(Array, grid))

space breaks (for display in REPL):

Fields.Adapt.adapt(array_type, center_space)

ERROR: type DeviceExtrudedFiniteDifferenceGrid has no field horizontal_grid

similarly:

Fields.Adapt.adapt(Array, center_space) # breaks with same error



## System details

Any relevant system information:
- Julia version `v1.10`
- operating system `clima cluster`
- modules loaded on cluster (`module list`): `climacommon/2024_10_08`
haakon-e commented 22 hours ago

Note: Calling CA.get_simulation(config) on this config will fail due to non-bitstype issues. If we can fix the issue described above, I hope to update ClimaAtmos.jl to construct the local state on CPU then adapt to GPU. This would not be needed if we actually fix interpolation on GPU, but that seems to be more challenging to achieve.

The atmos_state function in ClimaAtmos.jl/src/initial_conditions/atmos_state.jl would be updated to something like:

import ClimaComms

# [...]

function atmos_state(local_state, atmos_model, center_space, face_space)
    array_type = ClimaComms.array_type(ClimaComms.device(center_space))
    cpu_center_space = Fields.Adapt.adapt(Array, center_space)
    cpu_c = atmos_center_variables.(
        local_state.(Fields.local_geometry_field(cpu_center_space)),
        atmos_model,
    )
    c = Fields.Adapt.adapt(array_type, cpu_c)

    cpu_face_space = Fields.Adapt.adapt(Array, face_space)
    cpu_f = atmos_face_variables.(
        local_state.(Fields.local_geometry_field(cpu_face_space)),
        atmos_model,
    )
    f = Fields.Adapt.adapt(array_type, cpu_f)

    Fields.FieldVector(;
        c, f,
        atmos_surface_field(
            Fields.level(face_space, Fields.half),
            atmos_model.surface_model,
        )...,
    )
end