Add unit tests for gpu functions

dmargala commented 4 years ago

This PR adds unit tests that compare the cpu and gpu versions of the functions implemented in gpu_specter.extract. All of the gpu functions in gpu_specter.extract.gpu have been updated to match the interface and functionality of their versions in gpu_specter.extract.cpu.

There are just a few functions left that do not have a gpu version, notably, gpu_specter.extract.cpu.ex2d_padded and numpy.polynomial.legendre.legval.

The new gpu_specter.extract.both.xp_ex2d_patch is a version of gpu_specter.extract.cpu.ex2d_patch that is compatible with inputs from either numpy.ndarray or cupy.ndarray.

The figure below compares the runtime for a single patch using the numba optimized gpu_specter.extract.cpu.ex2d_patch, the new xp_ex2d_patch with cupy.ndarray inputs, and the newxp_ex2d_patch with numpy.ndarray inputs. The projection matrix shape for this patch corresponds to A4.shape = (ny, nx, nspec, nwave) = (102, 46, 5, 50). The benchmark was performed in a jupyter notebook on a cori shared gpu node.

rcthomas commented 4 years ago

The fastest runtime is in the 20ms range, is the plan to have multiple patches? I'm happier when I see the fastest implementation have enough work to take a few seconds. Maybe that's a separate test?

sbailey commented 4 years ago

Thanks.

I see that you have wrapped cupy imports in try/except blocks so that the code continues to be useable without cupy (and I checked this on my laptop) — good.
At the level of testing, do we have a clean way to test if the host machine has a GPU installed (not just cupy installed)? If cupy is installed and we're on a machine that has a GPU, I think we should run the tests without requiring special user options. But if cupy is installed and a GPU isn't available (like @lastephey mentioned if we install cupy as part of desiconda), then it shouldn't count as a failed test.
- put another way: if a GPU is installed, it should run additional tests to compare GPU and CPU, but also continue to run the same CPU-based tests. It's not clear to me if that is what happens here, or if the presence of a GPU causes it to run a different set of tests.
For the future: At the level of running spex, I'd prefer to keep cpu as the default and require the user to do something special to trigger it to use GPUs. IIUC, this PR is laying the groundwork for GPU extractions but hasn't yet propagated all the way up to spex yet (ok).
Your PR post emphasized the timing with a plot, but to confirm: do the CPU/GPU tests all pass, for the functions that they compare?
It's interesting to see the per-patch timing, but I agree with Rollin that our main timing benchmarking comparisons should focus on larger amounts of work once the GPU wrappers around ex2d_patch are ready.

dmargala commented 4 years ago

@rcthomas A typical full frame extraction with this patch size would require about ~4700 patches. The current benchmark for a full frame extraction with this patch size on cori haswell node with 32 mpi ranks is around 1 min.

@lastephey Those are good points. Currently, the only entry point to the cupy enabled functions is through the test suite which will have a problem if it's run on non-gpu node in environment with cupy and numbda.cuda. I can add a simple test can be added to the try-except import block to solve that problem.

This PR does not change the default behavior of the code. A developer would have to explicitly import a function from the from the gpu_specter.extract.gpu module or from the gpu_specter.extract.both module and pass arguments of the appropriate array type. I have a hard time imagining a developer would ever want to completely replace numpy with cupy in heterogenous computing environment so I agree that we would not want to do that.

I think the idea is that a user would pass the --gpu argument to spex to specify they want to use the gpu version (like how the --mpi argument works).

dmargala commented 4 years ago

@sbailey I think my previous reply addresses most of your points. Sorry I didn't see it before I posted.

Regarding CPU/GPU tests, yes, I forgot to mention that they are all passing.

Setup gpu test environment:

ssh cori.nersc.gov
cd desi/gpu_specter
git checkout implement-gpu-extraction
module load esslurm python cuda
source activate desi-gpu
export PYTHONPATH=$(pwd)/py:$PYTHONPATH
salloc -C gpu -N 1 -t 30 -c 10 -G 1 -A m1759

Run the test suite:

dmargala@cgpu11:~/desi/gpu_specter> srun python -m unittest --verbose gpu_specter.test.test_suite
test_basics (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... ok
test_compare_specter (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... skipped 'specter not available'
test_compare_xp_cpu (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... ok
test_compare_xp_gpu (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... ok
test_basics (gpu_specter.test.test_projection_matrix.TestProjectionMatrix) ... ok
test_compare_gpu (gpu_specter.test.test_projection_matrix.TestProjectionMatrix) ... ok
test_compare_specter (gpu_specter.test.test_projection_matrix.TestProjectionMatrix) ... skipped 'specter not available'
test_basics (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... ok
test_compare_gpu (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... ok
test_compare_specter (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... skipped 'specter not available'
test_gpu_basics (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... ok
test_basics (gpu_specter.test.test_spots.TestPSFSpots) ... ok
test_compare_gpu (gpu_specter.test.test_spots.TestPSFSpots) ... ok
test_compare_specter (gpu_specter.test.test_spots.TestPSFSpots) ... skipped 'specter not available'

----------------------------------------------------------------------
Ran 14 tests in 4.087s

OK (skipped=4)

I also the test suite in an environment with specter and confirmed that those tests still pass as well.

dmargala commented 4 years ago

It looks like there is a cupy.is_available() function that returns True for me on a gpu node and False on non-gpu node using same conda environment with cupy. I'll update the try-except blocks in the test suite to use this.

And just to clarify, the presence of the cupy/numba.cuda runs additional tests. I implemented the gpu tests in the same fashion as the specter comparison tests.

lastephey commented 4 years ago

@sbailey To answer your question about a non-cupy way to check for the GPU, the best thing I can think of is the command nvidia-smi. It's a little messy but this will tell you if you have access to an (NVIDIA) gpu.

There could be something like:

import subprocess

subprocess.run(["nvidia-smi"])

For a GPU the output is:

Fri May 29 10:59:24 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:1A:00.0 Off |                    0 |
| N/A   29C    P0    39W / 300W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And without a GPU the output is:

Traceback (most recent call last):
  File "is_there_gpu.py", line 3, in <module>
    subprocess.run(["nvidia-smi"])
  File "/usr/common/software/python/3.7-anaconda-2019.10/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/common/software/python/3.7-anaconda-2019.10/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/common/software/python/3.7-anaconda-2019.10/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'

It's not "clean" and would require a little work to parse, but it should return the presence of a GPU independent of framework (with the caveat that we have done module load cuda.)

desihub / gpu_specter

Add unit tests for gpu functions #40