Here are some updates to failing gpu comparison unit tests (see #77). These tests aren't run automatically by github actions (they require gpus) so I'm not sure when they started failing.
The main change that I made is to calculate and use a mask before comparing various extraction outputs. In a few cases, I relaxed tolerances slightly since many of the unit tests started with the very tight tolerances and we had relaxed some of those requirements after performing other studies between gpu_specter and specter.
Here is output from a manual run of the tests using this branch on a single perlmutter node.
dmargala@nid008696:~/source/desihub/gpu_specter> source /global/common/software/desi/desi_environment.sh main
dmargala@nid008696:~/source/desihub/gpu_specter> pytest py/gpu_specter
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /global/u2/d/dmargala/source/desihub/gpu_specter
plugins: astropy-header-0.1.2, asdf-2.7.2, filter-subpackage-0.1.1, openfiles-0.5.0, remotedata-0.3.3, arraydiff-0.3, mock-3.6.1, cov-3.0.0, hypothesis-6.36.0, doctestplus-0.11.2
collected 32 items
py/gpu_specter/test/test_core.py .... [ 12%]
py/gpu_specter/test/test_extract.py ........... [ 46%]
py/gpu_specter/test/test_linalg.py .... [ 59%]
py/gpu_specter/test/test_polynomial.py ... [ 68%]
py/gpu_specter/test/test_projection_matrix.py ... [ 78%]
py/gpu_specter/test/test_psfcoeff.py .... [ 90%]
py/gpu_specter/test/test_spots.py ... [100%]
============================================================================= warnings summary =============================================================================
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
py/gpu_specter/test/test_polynomial.py::TestPolynomial::test_gpu_hermevander
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (10) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (180) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (120) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (32) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_core.py::TestCore::test_compare_gpu
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
py/gpu_specter/test/test_projection_matrix.py::TestProjectionMatrix::test_compare_gpu
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (36) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (40) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_core.py::TestCore::test_gpu_batch_subbundle
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (45) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_polynomial.py::TestPolynomial::test_gpu_legvander
py/gpu_specter/test/test_psfcoeff.py::TestPSFCoeff::test_compare_gpu
py/gpu_specter/test/test_psfcoeff.py::TestPSFCoeff::test_gpu_basics
py/gpu_specter/test/test_spots.py::TestPSFSpots::test_compare_gpu
py/gpu_specter/test/test_spots.py::TestPSFSpots::test_compare_gpu
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (1) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_projection_matrix.py::TestProjectionMatrix::test_compare_gpu
py/gpu_specter/test/test_projection_matrix.py::TestProjectionMatrix::test_compare_gpu
py/gpu_specter/test/test_spots.py::TestPSFSpots::test_compare_gpu
py/gpu_specter/test/test_spots.py::TestPSFSpots::test_compare_gpu
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (12) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_projection_matrix.py::TestProjectionMatrix::test_compare_gpu
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (6) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
py/gpu_specter/test/test_spots.py::TestPSFSpots::test_compare_gpu
py/gpu_specter/test/test_spots.py::TestPSFSpots::test_compare_gpu
/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/numba/cuda/compiler.py:724: NumbaPerformanceWarning: Grid size (18) < 2 * SM count (216) will likely result in GPU under utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================================ 32 passed, 34 warnings in 66.48s (0:01:06) ================================================================
dmargala@nid008696:~/source/desihub/gpu_specter> pytest --disable-warnings py/gpu_specter
=========================================================================== test session starts ============================================================================
platform linux -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /global/u2/d/dmargala/source/desihub/gpu_specter
plugins: astropy-header-0.1.2, asdf-2.7.2, filter-subpackage-0.1.1, openfiles-0.5.0, remotedata-0.3.3, arraydiff-0.3, mock-3.6.1, cov-3.0.0, hypothesis-6.36.0, doctestplus-0.11.2
collected 32 items
Here are some updates to failing gpu comparison unit tests (see #77). These tests aren't run automatically by github actions (they require gpus) so I'm not sure when they started failing.
The main change that I made is to calculate and use a mask before comparing various extraction outputs. In a few cases, I relaxed tolerances slightly since many of the unit tests started with the very tight tolerances and we had relaxed some of those requirements after performing other studies between gpu_specter and specter.
Here is output from a manual run of the tests using this branch on a single perlmutter node.