Closed arnaudon closed 2 years ago
Yes, sorting is probably the best.
As an alternative, why it should guarantee any order? If user needs order, he can sort on his side. What do you think?
I agree with this, yes.
It was breaking a test randomly which was hardcoding the expected order of morphs in neuror here https://github.com/BlueBrain/NeuroR/pull/62. That was the origin of this, and I suspect it may happen again. So you think it's better to sort all tests using this function, or just sort it's output?
I'd say sort it's output where it is needed (in those failing tests).
What is the good reason for going through the pain of updating all tests to make sure they won't randomly fail in a future PR rather than a small change here?
By sorting, you are complexifying the functionality and decreasing the perf. You also need to update the docstring to explicit the expected sorted ordering. It is better to keep functions simple with a minimal functionality. Moreover if a test is failing because it expects a specific ordering, the tests is not testing what it should. I don't know what is the issue in this case but you can probably fix it by change [] to {} somewhere in the expected result.
so the simplest is to not use that function in neuror test, and just use a custom sorted one
{morphs[0]: result_morph0, morphs[1]: result_morph1}
, where morphs
is the output of this non-deterministic function, so the key/values are random and the assert fails random.what change in perf is sorted inducing?
The change is O(1) to O(N log N). It is massive. Like really massive. Try doing next(iter_morphology_files(folder))
and next(sorted(iter_morphology_files(folder))
on a folder containing a large amount of morphologies on GPFS. You could go from ~10ms to 10 sec easily.
this function is so minimal already it is near useless and seems to only breaks reproducibility
I must be using it in 20 locations lol, I wouldn't call it useless ! And I have no need of the sorting.
the test compares against {morphs[0]: result_morph0, morphs[1]: result_morph1}, where morphs is the output of this non-deterministic function, so the key/values are random and the assert fails random.
Have you tried using assert_dict_equal
from nose ? That should be invariant of the ordering. Or dictdiffer
?
Ah ok, indeed, it's massive, I didn't realize. The issue is not comparing sorted dict, it is that the dict constructed to compare result has wrong pairs of key/values if the keys are shuffled randomly. I'll put an issue in neuror to fix all the tests then.
I guess we can close this issue?
This function is sometimes non-reproducible in the ordering of morphs:
https://github.com/BlueBrain/morph-tool/blob/master/morph_tool/utils.py#L16
I think because of pathlib, maybe we can add an sorted() somewhere to ensure it is?
@asanin-epfl @adrien-berchet , what do you think?