CliMA / Oceananigans.jl

🌊 Julia software for fast, friendly, flexible, ocean-flavored fluid dynamics on CPUs and GPUs
https://clima.github.io/OceananigansDocumentation/stable
MIT License
974 stars 193 forks source link

Help running a GPU case #1509

Closed sam12396 closed 3 years ago

sam12396 commented 3 years ago

Hi all!

With the help of Ali and a colleague at my school, I can now run Oceananigans using a GPU on my school's HPC. I am now at a step that is totally unfamiliar to me, coding for a GPU.

Based on what I have read from the Oceananigans documentation, I think the only trouble I will have is creating my forcing functions and my initial conditions. Currently, I read in data from csv's, perform a 1D spline interpolation, and then assign the interpolated function (which is 1D) to a 3D field function.

My understanding is that the way arrays are treated is quite different when working with GPUs and I honestly don't even know where to begin. Below is an excerpt of my run script that shows the initial and boundary condition creation with a CPU architecture.

If anyone can share resources on how to code this for a GPU or just explain it if it is easy, I would greatly appreciate the help!

ρₒ = 1025
############## Boundary conditions ###################
## Pulling the boundary conditions from the data csv
df = CSV.read("data_inputs/kma_buoy_fluxes_soulik.csv", DataFrame);

seconds = df.time[:]*86400; # convert to seconds
secs    = [tnow - seconds[1] for tnow in seconds]; # set first time stamp to 0
secs    = [s - 518400 for s in secs]; # Move 0 time stamp to later in the time series

## make the splines
spl_taux  = Spline1D(secs, df.taux/(ρₒ), k=1);
spl_tauy  = Spline1D(secs, df.tauy/(ρₒ), k=1);
spl_latHF = Spline1D(secs, df.lat_hf/(ρₒ * cᴾ),  k=1);
spl_senHF = Spline1D(secs, df.sens_hf/(ρₒ * cᴾ), k=1);

## turn the splines into functional arguments
@inline Fxn_taux(x,y,t) = spl_taux(t);
@inline Fxn_tauy(x,y,t) = spl_tauy(t);
@inline Fxn_HFlx(x,y,t) = spl_latHF(t) + spl_senHF(t); # K m⁻¹ s⁻¹, surface _temperature_ flux

############# Initial conditions #############
## Initial surface stress condition
Qo = sqrt(spl_taux(0)^2 + spl_tauy(0)^2)

## Random noise damped at top and bottom
Ξ(z) = randn() * z / model.grid.Lz * (1 + z / model.grid.Lz); # noise

## Velocity initial condition: random noise scaled by the initial stress.
uᵢ(x, y, z) = abs(Qo) * 1e-3 * Ξ(z);
wᵢ(x, y, z) = abs(Qo) * 1e-6 * Ξ(z); # This was added to reduce the scale of the w profile

## Temperature and Salinity initial condition

# Intialize from glider data
df_i = CSV.read("data_inputs/initial_prof.csv",DataFrame)
Temp = df_i.temp[:]
Salinity = df_i.sal[:]
depth=df_i.depth[:]

spl_temp = Spline1D(depth,Temp,k=1)
spl_sal = Spline1D(depth,Salinity,k=1)

@inline tempset_prof(x,y,z) = spl_temp(z)
@inline salset_prof(x,y,z) = spl_sal(z)
francispoulin commented 3 years ago

Hello @sam12396 , I am glad you have been able to run Oceananigans on GPUs. That could speed things up a great deal, depending on what you are trying to do.

Oceananigans has been written in such a way that the user does not need to do anything different in terms of setting up a problem on CPUs vs GPUs. To illustrate my point, consider the shallow water Bickley jet example here. The default is set up to run on CPUs but if you want to run it on GPUs, then it's easy. Where you define the model you need to add one line,

model = ShallowWaterModel(
    architecture=GPU(),
    timestepper=:RungeKutta3,
    advection=WENO5(),
    grid=grid,
    gravitational_acceleration=g,
    coriolis=FPlane(f=f),
    )

When you make that one change (setting the architecture in the second line above), then the code will use GPUs. The producing of the data and reading and writing is then all done with GPUs. There is nothing else for you to do. In particular, if you define your forcing and initial conditions using functions, as the examples tend to do, then nothing needs to change.

If you want to use Oceananigans.jl, are you sure you need to program things differently?

tomchor commented 3 years ago

For simple cases, like @francispoulin mentioned, changing the architecture from CPU() to GPU() are often enough. Although if you wanna run simulations that are a bit more complicated, there are a few other things you might need to worry about. You can see a recent discussion about it here. Mostly you have to define everything that is being used in the model calculations as a constant, otherwise the GPU won't know what to do with it. So for example you probably will need to change some of your lines to

const Qo = sqrt(spl_taux(0)^2 + spl_tauy(0)^2)

## Random noise damped at top and bottom
const Hz = grid.Lz
Ξ(z) = randn() * z / Hz * (1 + z / Hz); # noise

## Velocity initial condition: random noise scaled by the initial stress.
uᵢ(x, y, z) = abs(Qo) * 1e-3 * Ξ(z);
wᵢ(x, y, z) = abs(Qo) * 1e-6 * Ξ(z); # This was added to reduce the scale of the w profile

(probably the same goes in other places too.)

I recommend you first run the very simple examples that appear in the README document of this repo on a GPU to make sure that Oceananigans+GPU is working correctly. And then only after that you should try to change your example to a GPU one. If you come across some errors that you can't solve we can take it from there :)

glwagner commented 3 years ago

@sam12396 when referencing arrays on the GPU you need to convert Array to CUDA.CuArray.

It looks like you are using an object called df.taux that comes out of the CSV package:

spl_taux  = Spline1D(secs, df.taux / ρₒ, k=1)

Provided that whatever Spline1D does is GPU-friendly (and it may not be), then you may only need to convert df.taux / ρₒ to a CuArray prior to passing it to Spline1D. I don't know if this code would work, but as an example of something you might do:

using CUDA

kinematic_x_momentum_flux = Array(df.taux / ρₒ)
kinematic_x_momentum_flux = CuArray(kinematic_x_momentum_flux)

spl_taux  = Spline1D(secs, kinematic_x_momentum_flux, k=1)

The array secs may also need to be converted:

secs = CuArray(secs)

Spline1D may not work on the GPU, but there seem to be other options. A google search for "interpolation GPU julia" turned up this page:

https://juliagpu.org/2020-10-30-cuda_2.1/

which has instructions for using some CUDA built-in interpolation functionality.

In summary, if you're working with arrays on the GPU, you'll need to

As a side note, a convenient way to program a script to be switchable from CPU to GPU easily is to include a line at the top like

DeviceArrayType = arch isa GPU ? CuArray : Array

and then use patterns like

data = Array(df.data)
data = convert(DeviceArrayType, data)

This allows you to switch between arch = GPU() and arch = CPU() with a single line at the top of your script.

As a side, side note, julia does not require you to put semi-colons to end lines!

Please don't be afraid to simply try running your script and reporting the error you get (if any!) With a concrete error, we can make concrete suggestions to solve your problem (without having to run the script ourselves, which is more work).

sam12396 commented 3 years ago

Hi all!

Thank you for the helpful advice and sorry for my delayed response! I am using shared gpu resources on my school's HPC and have not been able to access any resources to test anything out yet so I will update here once I can get some testing done on my side. Thank you again for the help!