desihub / desitarget

DESI Targeting
BSD 3-Clause "New" or "Revised" License
18 stars 23 forks source link

test_mtl mid-z QSO test failure (intermittent?) #787

Open sbailey opened 2 years ago

sbailey commented 2 years ago

Nightly integration tests at NERSC failed last night with the following error, though I haven't been able to reproduce it by rerunning tests with my own checkout at NERSC. @geordie666 please check.

======================================================================
ERROR: test_midz_no_lock_in (desitarget.test.test_mtl.TestMTL)
Test true mid-z QSOs revert to tracers when redshifts change.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/common/software/desi/cori/desiconda/20200801-1.4.0-spec/code/desitarget/master/py/desitarget/test/test_mtl.py", line 317, in test_midz_no_lock_in
    pp[iimidzmtl] = pp[iilowzmtl]
ValueError: NumPy boolean array indexing assignment cannot assign 2 input values to the 1 output values where the mask is true
geordie666 commented 2 years ago

I ran python -m test_mtl 200 times at NERSC after sourcing master and I couldn't recreate this failure. It's a little hard to debug the issue when I can't recreate it, so how much should I push on this?

sbailey commented 2 years ago

Thanks for checking. Since the exception is inside the test itself (rather than inside algorithmic code that the test is testing), let's call this good enough and I'll reopen the ticket if the failure occurs again.

sbailey commented 2 years ago

@geordie666 another intermittent test_mtl failure popped up last night, but passes this morning:

======================================================================
FAIL: test_midz_no_lock_in (desitarget.test.test_mtl.TestMTL)
Test true mid-z QSOs revert to tracers when redshifts change.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/common/software/desi/cori/desiconda/20200801-1.4.0-spec/code/desitarget/master/py/desitarget/test/test_mtl.py", line 320, in test_midz_no_lock_in
    self.assertTrue(np.all(mtl['PRIORITY'] == pp))
AssertionError: False is not true

----------------------------------------------------------------------

Please review that code for whether there might be side effects to the individual tests such that running them in a different order could cause one of them to break another. e.g. leaving a test file behind, or a modification to a self.blah variable that isn't cleared out by the setUp call before the next test.

If nothing pops up after a few minutes investigation, ok to close this again and we'll re-report if it comes up again.

geordie666 commented 2 years ago

Irritating. We could take the lazy option and turn off the test, as the lines of code that are being covered are stable and unlikely to change. But, led me dig some more to try to understand the problem. One solution might be to hard-code some numbers in that test, rather than inheriting them from the class.

geordie666 commented 2 years ago

@sbailey: Suggested approach. Let's skip this unit test with a decorator, for now, but leave this issue open. When I find time to drill down on a solution, I'll turn the unit test back on.

sbailey commented 2 years ago

it fails infrequently enough that it isn't a big problem. I suggest leaving it on to build up stats and examples. I can post here if/when there are other failures, but it isn't urgent, nor is it overly distracting or problematic.

If we're going to do anything short term to the code, I'd suggest augmenting the test logging so that the next time one of those two tests fail, it leaves behind more information that might be useful for debugging what happened (e.g. printing the values of pp, mtl, iilowz, iihighz, ...)