Closed scottfleming closed 1 month ago
The shuffling is helpful for making sure that if the code fails, it fails quickly.
I agree this is invalid, we should be using our own Python RNG object to avoid contamination global seed state. My apologies for the bug.
@scottfleming https://github.com/Medical-Event-Data-Standard/meds_etl/pull/26 is a PR that will probably work, but I don't have the time to test it now.
Is the shuffling here really necessary?
If so, can we not set the seed for the user?
I'm generally against silently setting the seed and instead am in favor of the user setting it on their end if they want determinism. Setting the seed silently like this can result in nasty bugs when a function like this is embedded in a larger program that assumes the random seed isn't reset every time the function is called.
https://github.com/Medical-Event-Data-Standard/meds_etl/blob/9ec2f3e20ddbc01ceab999ccc9f4782e7485e73a/src/meds_etl/omop.py#L726