desihub / gpu_specter

Scratch work for porting spectroperfectionism extractions to GPUs
BSD 3-Clause "New" or "Revised" License
2 stars 3 forks source link

Add unit tests for gpu functions #40

Closed dmargala closed 4 years ago

dmargala commented 4 years ago

This PR adds unit tests that compare the cpu and gpu versions of the functions implemented in gpu_specter.extract. All of the gpu functions in gpu_specter.extract.gpu have been updated to match the interface and functionality of their versions in gpu_specter.extract.cpu.

There are just a few functions left that do not have a gpu version, notably, gpu_specter.extract.cpu.ex2d_padded and numpy.polynomial.legendre.legval.

The new gpu_specter.extract.both.xp_ex2d_patch is a version of gpu_specter.extract.cpu.ex2d_patch that is compatible with inputs from either numpy.ndarray or cupy.ndarray.

The figure below compares the runtime for a single patch using the numba optimized gpu_specter.extract.cpu.ex2d_patch, the new xp_ex2d_patch with cupy.ndarray inputs, and the newxp_ex2d_patch with numpy.ndarray inputs. The projection matrix shape for this patch corresponds to A4.shape = (ny, nx, nspec, nwave) = (102, 46, 5, 50). The benchmark was performed in a jupyter notebook on a cori shared gpu node.

image

rcthomas commented 4 years ago

The fastest runtime is in the 20ms range, is the plan to have multiple patches? I'm happier when I see the fastest implementation have enough work to take a few seconds. Maybe that's a separate test?

sbailey commented 4 years ago

Thanks.

dmargala commented 4 years ago

@rcthomas A typical full frame extraction with this patch size would require about ~4700 patches. The current benchmark for a full frame extraction with this patch size on cori haswell node with 32 mpi ranks is around 1 min.

@lastephey Those are good points. Currently, the only entry point to the cupy enabled functions is through the test suite which will have a problem if it's run on non-gpu node in environment with cupy and numbda.cuda. I can add a simple test can be added to the try-except import block to solve that problem.

This PR does not change the default behavior of the code. A developer would have to explicitly import a function from the from the gpu_specter.extract.gpu module or from the gpu_specter.extract.both module and pass arguments of the appropriate array type. I have a hard time imagining a developer would ever want to completely replace numpy with cupy in heterogenous computing environment so I agree that we would not want to do that.

I think the idea is that a user would pass the --gpu argument to spex to specify they want to use the gpu version (like how the --mpi argument works).

dmargala commented 4 years ago

@sbailey I think my previous reply addresses most of your points. Sorry I didn't see it before I posted.

Regarding CPU/GPU tests, yes, I forgot to mention that they are all passing.

Setup gpu test environment:

ssh cori.nersc.gov
cd desi/gpu_specter
git checkout implement-gpu-extraction
module load esslurm python cuda
source activate desi-gpu
export PYTHONPATH=$(pwd)/py:$PYTHONPATH
salloc -C gpu -N 1 -t 30 -c 10 -G 1 -A m1759

Run the test suite:

dmargala@cgpu11:~/desi/gpu_specter> srun python -m unittest --verbose gpu_specter.test.test_suite
test_basics (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... ok
test_compare_specter (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... skipped 'specter not available'
test_compare_xp_cpu (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... ok
test_compare_xp_gpu (gpu_specter.test.test_ex2d_patch.TestEx2dPatch) ... ok
test_basics (gpu_specter.test.test_projection_matrix.TestProjectionMatrix) ... ok
test_compare_gpu (gpu_specter.test.test_projection_matrix.TestProjectionMatrix) ... ok
test_compare_specter (gpu_specter.test.test_projection_matrix.TestProjectionMatrix) ... skipped 'specter not available'
test_basics (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... ok
test_compare_gpu (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... ok
test_compare_specter (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... skipped 'specter not available'
test_gpu_basics (gpu_specter.test.test_psfcoeff.TestPSFCoeff) ... ok
test_basics (gpu_specter.test.test_spots.TestPSFSpots) ... ok
test_compare_gpu (gpu_specter.test.test_spots.TestPSFSpots) ... ok
test_compare_specter (gpu_specter.test.test_spots.TestPSFSpots) ... skipped 'specter not available'

----------------------------------------------------------------------
Ran 14 tests in 4.087s

OK (skipped=4)

I also the test suite in an environment with specter and confirmed that those tests still pass as well.

dmargala commented 4 years ago

It looks like there is a cupy.is_available() function that returns True for me on a gpu node and False on non-gpu node using same conda environment with cupy. I'll update the try-except blocks in the test suite to use this.

And just to clarify, the presence of the cupy/numba.cuda runs additional tests. I implemented the gpu tests in the same fashion as the specter comparison tests.

lastephey commented 4 years ago

@sbailey To answer your question about a non-cupy way to check for the GPU, the best thing I can think of is the command nvidia-smi. It's a little messy but this will tell you if you have access to an (NVIDIA) gpu.

There could be something like:

import subprocess

subprocess.run(["nvidia-smi"])

For a GPU the output is:

Fri May 29 10:59:24 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:1A:00.0 Off |                    0 |
| N/A   29C    P0    39W / 300W |      0MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And without a GPU the output is:

Traceback (most recent call last):
  File "is_there_gpu.py", line 3, in <module>
    subprocess.run(["nvidia-smi"])
  File "/usr/common/software/python/3.7-anaconda-2019.10/lib/python3.7/subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/usr/common/software/python/3.7-anaconda-2019.10/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/common/software/python/3.7-anaconda-2019.10/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi'

It's not "clean" and would require a little work to parse, but it should return the presence of a GPU independent of framework (with the caveat that we have done module load cuda.)