investigate allocations

CliMA / ClimaCoupler.jl

ClimaCoupler: bringing atmosphere, land, and ocean together

Apache License 2.0

27 stars 5 forks source link

investigate allocations #683

Closed juliasloan25 closed 6 months ago

juliasloan25 commented 8 months ago

When we try to run the DYAMOND configuration on central's P100 GPUs, it fails because there isn't enough memory available during the atmos_init call. The same run works fine on clima's A100 GPUs, but in atmos_init we see Effective GPU memory usage: 87.32% (69.114 GiB/79.150 GiB). 70GB memory usage is a lot, so we need to look into where these allocations are coming from.

We can do this by placing CUDA.memory_status calls throughout the code to see where the allocations jump

juliasloan25 commented 6 months ago

Coupler output table shows very similar allocations between atmos-only and coupled simulations, as of 5/1 (on GPU): coupled simulation allocations: 3.361 GiB atmos-only simulation allocations: 3.255 GiB

(on CPU): coupled CoupledSimulation object allocations: 0.196 GiB atmos-only CoupledSimulation object allocations: 0.195 GiB