tests/test_parallel.py fails sometimes

chris-hld commented 4 years ago

Test_parallel fails frequently, but not always. It seems that tests/test_parallel.py::test_render_bsdm is causing it (most of the times?). As it doesn't occur every time, it might be undefined behaviour related to the shared memory access.

I will first try to collect some tracebacks, maybe they show some pattern.

Test on macOS-latest with Python 3.6

test_jobs = None

    @pytest.mark.parametrize('test_jobs', JOB_COUNTS)
    def test_render_bsdm(test_jobs):
        sdm_p, sdm_phi, sdm_theta = [*np.random.randn(3, 1000)]
        hrirs = spa.IO.load_hrirs(fs=44100, filename='dummy')
        bsdm_l_r, bsdm_r_r = spa.sdm.render_bsdm(sdm_p, sdm_phi, sdm_theta, hrirs,
                                                 jobs_count=1)
        bsdm_l_t, bsdm_r_t = spa.sdm.render_bsdm(sdm_p, sdm_phi, sdm_theta, hrirs,
                                                 jobs_count=test_jobs)
>       assert_allclose([bsdm_l_t, bsdm_r_t], [bsdm_l_r, bsdm_r_r])
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       Mismatched elements: 1 / 2510 (0.0398%)
E       Max absolute difference: 0.64633057
E       Max relative difference: 1.
E        x: array([[ 1.955098,  0.205032, -1.341322, ...,  0.      ,  0.      ,
E                0.      ],
E              [ 1.955098,  0.205032, -1.341322, ...,  0.      ,  0.      ,
E                0.      ]])
E        y: array([[ 1.955098,  0.205032, -1.341322, ...,  0.      ,  0.      ,
E                0.      ],
E              [ 1.955098,  0.205032, -1.341322, ...,  0.      ,  0.      ,
E                0.      ]])

HaHeho commented 4 years ago

Strange. Also occasionally happens when you execute it locally, or only in tests?

It is always only one time domain sample that is wrong?

I have not looked into the actual processing at all, but this might be an idea on narrowing the error down. The "sometimes" most likely happens due to the utilization of randn as an input. However, the splitting into multiple tasks should not alter the results anyways of course. The data types in both branches are exactly the same? Maybe it is a matter of lacking numerical precision, which is only sometimes triggered due to the noise.

chris-hld commented 4 years ago

I could also make it fail locally, however, only a single time so far. Seems to be a lot less frequent when executed on a local machine. Yes, it seems that it is always exactly one sample...

chris-hld commented 4 years ago

Another traceback:

Test on ubuntu-latest with Python 3.7

=================================== FAILURES ===================================
_____________________________ test_render_bsdm[2] ______________________________

test_jobs = 2

    @pytest.mark.parametrize('test_jobs', JOB_COUNTS)
    def test_render_bsdm(test_jobs):
        sdm_p, sdm_phi, sdm_theta = [*np.random.randn(3, 1000)]
        hrirs = spa.IO.load_hrirs(fs=44100, filename='dummy')
        bsdm_l_r, bsdm_r_r = spa.sdm.render_bsdm(sdm_p, sdm_phi, sdm_theta, hrirs,
                                                 jobs_count=1)
        bsdm_l_t, bsdm_r_t = spa.sdm.render_bsdm(sdm_p, sdm_phi, sdm_theta, hrirs,
                                                 jobs_count=test_jobs)
>       assert_allclose([bsdm_l_t, bsdm_r_t], [bsdm_l_r, bsdm_r_r])
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       Mismatched elements: 1 / 2510 (0.0398%)
E       Max absolute difference: 1.31017305
E       Max relative difference: 1.
E        x: array([[0.916457, 2.660908, 1.464512, ..., 0.      , 0.      , 0.      ],
E              [0.916457, 2.660908, 1.464512, ..., 0.      , 0.      , 0.      ]])
E        y: array([[0.916457, 2.660908, 1.464512, ..., 0.      , 0.      , 0.      ],
E              [0.916457, 2.660908, 1.464512, ..., 0.      , 0.      , 0.      ]])

HaHeho commented 4 years ago

So it is not related to the Python version since this is 3.7 now? (That would have been very weird anyways, if all other packages are identical).

Max relative difference: 1. means that the single value is exactly double in one case?

HaHeho commented 4 years ago

I'm pretty certain this should solve it.

# part of parallel render_bsdm:
def _render_bsdm_sample(i, p, phi, theta, hrirs):
    h_l, h_r = hrirs[hrirs.nearest(phi, theta)]
    # global shared_array
    with shared_arr.get_lock(): # synchronize access
        shared_array[i:i + len(h_l), 0] += p * h_l
        shared_array[i:i + len(h_l), 1] += p * h_r

chris-hld commented 4 years ago

Hm, I am using multiprocessing.Array, which already includes a lock... https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Array I am quite surprised why this particular function seems to fail while others don't.

HaHeho commented 4 years ago

Yes. I see the documentation is confusing in regards of the lock. The way that I understood it so far is, that it provides lock functionality for synchronized access, but it is not managed for you. This is because many examples I found online are also applying the lock in the way I specified in the code above.

I did not check all methods. But could it be that bsdm is the only one where the system state is part of the processing (i.e. utilizing +=, hence require synchronization). If so, this is also a strong indicator that manual locking is required. Did you try it this way and test still fail occasionally?

chris-hld commented 4 years ago

Seems it was indeed the += inplace operator. I'm also casting back to a numpy array, so the operator seems to be not thread safe there.

cf77fa8f9065e7ecca2f8b6667baca8fdd42392b and f5e5654b7c463710f09c92ca4c07a272c32c0a8a target this issue.

The reentrant lock seemed necessary at least for Windows, even though on my local Linux machine I couldn't make it fail with repeat 100 { pytest tests/test_parallel.py::test_render_bsdm }

Thank you @HaHeho !

chris-hld commented 4 years ago

Re-opening, because still not resolved. The last times failing seem to happen always on Windows.

Traceback:

Cross-Platform Test / Test on windows-latest with Python 3.6

test_jobs = 2
54

55
    @pytest.mark.parametrize('test_jobs', JOB_COUNTS)
56
    def test_render_bsdm(test_jobs):
57
        sdm_p, sdm_phi, sdm_theta = [*np.random.randn(3, 1000)]
58
        hrirs = spa.IO.load_hrirs(fs=44100, filename='dummy')
59
        bsdm_l_r, bsdm_r_r = spa.sdm.render_bsdm(sdm_p, sdm_phi, sdm_theta, hrirs,
60
                                                 jobs_count=1)
61
        bsdm_l_t, bsdm_r_t = spa.sdm.render_bsdm(sdm_p, sdm_phi, sdm_theta, hrirs,
62
                                                 jobs_count=test_jobs)
63
>       assert_allclose([bsdm_l_t, bsdm_r_t], [bsdm_l_r, bsdm_r_r])
64
E       AssertionError: 
65
E       Not equal to tolerance rtol=1e-07, atol=0
66
E       
67
E       Mismatched elements: 1 / 2510 (0.0398%)
68
E       Max absolute difference: 0.01022394
69
E       Max relative difference: 1.
70
E        x: array([[-1.208554, -0.140148,  0.93946 , ...,  0.      ,  0.      ,
71
E                0.      ],
72
E              [-1.208554, -0.140148,  0.93946 , ...,  0.      ,  0.      ,
73
E                0.      ]])
74
E        y: array([[-1.208554, -0.140148,  0.93946 , ...,  0.      ,  0.      ,
75
E                0.      ],
76
E              [-1.208554, -0.140148,  0.93946 , ...,  0.      ,  0.      ,
77
E                0.      ]])

HaHeho commented 4 years ago

Windows potentially (by default) follows a different mechanic of spawning child processes. I am not sure why this would influence the lock though, https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

However, I did not really understand the way you implemented the lock there. What is the reason for using RLock() instead of Lock(). The multiprocessing documentation is pretty cryptic on practical differences. This seems to explain differences: https://stackoverflow.com/questions/22885775/what-is-the-difference-between-lock-and-rlock

According to this, I would assume that Lock() is the more reasonable choice for you (supposed to be faster and does not allow multiple entries by the same process). Also, why not straight up use the associated lock created with the array (with shared_arr.get_lock():)?

I would try that and see if that solves the problem.

HaHeho commented 4 years ago

Actually, https://docs.python.org/3/library/multiprocessing.html#all-start-methods (Explicitly pass resources to child processes) explains that this might actually be cause by the different process spawning on Windows. I would still expect my suggested way above to work, and be the most easy to read. Otherwise, you seem to be required to pass the lock instance to the function, in order to properly work under Windows.

chris-hld / spaudiopy

tests/test_parallel.py fails sometimes #13