`test_apple.py` currently failing in CI

j-c-c commented 1 year ago

test_apple.py is failing for the test testPickCenters.

I am currently attempting to reproduce the failing test locally.

j-c-c commented 1 year ago

Was able to reproduce the failing test on decaf. Here is the list of particle centers that were found by apple_picker.process_micrograph_centers() that do not match the reference list in the test:

(1321.0, 361.0)
(2793.0, 1239.0)
(1741.0, 1391.0)
(2703.0, 1737.0)
(861.0, 1839.0)
(965.0, 2305.0)
(749.0, 2473.0)
(699.0, 2627.0)
(2153.0, 2741.0)
(599.0, 2943.0)
(1247.0, 3039.0)
(2037.0, 3281.0)
(2501.0, 3493.0)

Each of these is off by two pixels (from a reference center) in either the x or y coordinate.

janden commented 1 year ago

At this point, we may want to consider replacing these (hardcoded) tests with simulated data (put a couple of particle projections onto a micrograph and add noise). This way we can control the noise level, etc.

j-c-c commented 1 year ago

It appears the culprit here is a change to np.argsort() that came with the release of NumPy 1.25.

The divergent behavior occurs only on Linux in this line of code: idx = np.argsort(-np.reshape(score, (np.prod(score.shape)), "F"))

So far I have confirmed that the divergent behavior only occurs for the default kind of sorting, ie. kind='quicksort', the other 3 sorting methods ('mergesort', 'stable', heapsort') maintain behavior between NumPy versions.

A solution here might be to switch to a more stable sorting method. I will confirm if the other sorting methods produce reasonable results for test_appple.py

janden commented 1 year ago

Good sleuthing! The question remains why quicksort changes behavior here (and only for Linux). From what I understand, all sorting algorithms are deterministic, so they should all give the same result. Maybe a bug has been introduced in NP?

garrettwrong commented 1 year ago

I don't think all sorting algorithms are deterministic. That is why they mention stable ones....

garrettwrong commented 1 year ago

(mainly with regard to what happens on ties)

janden commented 1 year ago

My understanding was that stability just meant that all stable algorithms broke ties the same way. If we stick to the same algorithm and run it multiple times, we expect it to give the same result each time, even with ties, correct?

garrettwrong commented 1 year ago

They (np) claim this, but it has not been my personal experience straddling different versions of code...

janden commented 1 year ago

Ok got it.

j-c-c commented 1 year ago

It appears the AVX-512 argsort handles ties in a different way:

Linux w/ AVX-512:

>>> arr = np.array([2,2,2,2,2,1,1,1,1,1,3,3,2,3,1])
>>> np.argsort(arr)
array([ 6,  7,  5,  8,  9, 14,  3,  0,  4,  1,  2, 12, 11, 10, 13])
>>> np.__version__
'1.25.0'

>>> arr = np.array([2,2,2,2,2,1,1,1,1,1,3,3,2,3,1])
>>> np.argsort(arr)
array([ 5,  6,  7,  8,  9, 14,  0,  1,  2,  3,  4, 12, 10, 11, 13])
>>> np.__version__
'1.23.5'

Mac OSX:

>>> arr = np.array([2,2,2,2,2,1,1,1,1,1,3,3,2,3,1])
>>> np.argsort(arr)
array([ 5,  6,  7,  8,  9, 14,  0,  1,  2,  3,  4, 12, 10, 11, 13])
>>> np.__version__
'1.25.0'

>>> arr = np.array([2,2,2,2,2,1,1,1,1,1,3,3,2,3,1])
>>> np.argsort(arr)
array([ 5,  6,  7,  8,  9, 14,  0,  1,  2,  3,  4, 12, 10, 11, 13])
>>> np.__version__
'1.23.5'

j-c-c commented 1 year ago

Ok, this definitely comes down to the optimized argsort handling ties in a different manner. What's happening for this test is that the parameters tau1 and tau2 are set in a way that the number of particle windows used for SVM training fall in the middle of a set of argsort ties. Meaning that a few training windows will be different between the optimized and non-optimized version.

I see two possible solutions:

1) Use a different sorting method, ie. stable sort. This would give the same result on all platforms. 2) Change tau1 and tau2 for this test so that the windows are chosen from a set that is not cutoff in the middle of a group of ties.

Either way, I'll have to change the reference centers in the test since slightly different particle centers are found.

A side note, I'm slightly suspicious of the score variable that is fed into the argsort being integers. It is calculated with the query_score method by computing a cross-correlation over query images/reference windows and then returns a count of correlations above a certain threshold. Many of these counts are ties.

j-c-c commented 1 year ago

An alternative solution that doesn't involve changing the reference centers is altering the test to check that centers_found are within a few pixels of the reference centers.

garrettwrong commented 1 year ago

I think maybe we can try doing both 1 and 2 together and hope for the best until we have the new MicrographSource.

janden commented 1 year ago

Fixed by #965.

ComputationalCryoEM / ASPIRE-Python

`test_apple.py` currently failing in CI #964