Add remark on multithreading/multiple GPU limitations that FourierFlows.jl imposes

navidcy commented 3 years ago

We should add a remark in the README and in the Docs on this.

[This was mentioned by @ranocha's in their review remarks.]

navidcy commented 3 years ago

Regarding multiple GPUs, probably calling a Problem constructor with dev=GPU() forced CUDA.jl to use device=0...(?)

E.g., when I asked for 3 GPUs on the HPC I got:

On a machine with 3

julia> prob = SingleLayerQG.Problem(GPU(); nx=n, ny=n+2, Lx=L, β=β, μ=μ, dt=dt, stepper=stepper)
Problem
  ├─────────── grid: grid (on GPU)
  ├───── parameters: params
  ├────── variables: vars
  ├─── state vector: sol
  ├─────── equation: eqn
  ├────────── clock: clock
  └──── timestepper: FilteredRK4TimeStepper

shell> nvidia-smi
Tue Mar  9 15:20:01 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:3D:00.0 Off |                    0 |
| N/A   35C    P0    57W / 300W |    410MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:3E:00.0 Off |                    0 |
| N/A   33C    P0    41W / 300W |      3MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000000:B2:00.0 Off |                    0 |
| N/A   35C    P0    42W / 300W |      3MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12191      C   ...ta/v45/nc3020/julia/julia      407MiB |
+-----------------------------------------------------------------------------+

julia>

glwagner commented 3 years ago

I think CUDA.jl may pick this GPU by default. I think the best solution is to link to CUDA.jl documentation for choosing a device. Users are also able to do fancier things, like run two problems side by side on different GPUs (some explanation is provided in the CUDA.jl docs for this).

navidcy commented 3 years ago

Could you point to this explanation and I’ll add a not in our docs.

glwagner commented 3 years ago

Here's some references:

Blog post on JuliaGPU about device-selection features added in CUDA 1.3. This is just a blog post so the syntax could go out of date.
Page in JuliaGPU docs about multi-GPU programming. This page is oriented towards people who want to use more than one GPU (not just select one of many).
Documentation of CUDA.device. device! is adjacent. I don't see the function devices however.

Often the most straightforward approach to using mulitple GPUs is to launch the same script but with different CUDA_VISIBLE_DEVICES parameters. This approach is outside julia.

$ CUDA_VISIBLE_DEVICES=0 julia --project cool_script.jl

This launches julia but with only one device visible (device "0" from the output of nvidia-smi). This environment variable is described in nvidia's CUDA documentation:

https://developer.nvidia.com/blog/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

FourierFlows / GeophysicalFlows.jl

Add remark on multithreading/multiple GPU limitations that FourierFlows.jl imposes #210