Closed EdwardBerman closed 1 week ago
You're right, a more modern approach was required. Instead of providing one seed for n maps (which would produce no variance), I pass a new seed iterated off +i
in the loop to give each map a random, but reproducable, map. This can be seen in the new generate_multiple_shear_dfs() function
here:
https://github.com/GeorgeVassilakis/SMPy/blob/758b640f009f3e7f6555373bdceaa0e798399355/SMPy/utils.py#L227
Thanks for your help @EdwardBerman :)
Are you positive that you need to make several seeds? I don't think there's anything wrong with that, but I'm fairly certain you can create a reproducible sequence of random shuffles with just one seed.
I believe so @EdwardBerman. As far as I can tell, If I passed one seed to the _shuffle_ra_dec()
function, the list of galaxies would be shuffled with random.seed(seed=42) for every realization of num_shuffles
(denoted i
) shuffled maps, meaning that there would be 0 variance, and they'd get shuffled the same every time.
i
shuffled maps in the for loop on line 226: https://github.com/GeorgeVassilakis/SMPy/blob/45edeaccad2a5d0a60d0c1c97ce5a2f0e8b1491f/smpy/utils.py#L226First shuffled map out of i
maps gets passed a seed of 42, making a shuffled map with seed 42.
Second shuffled map out of i
maps gets passed a seed of 43, making a shuffled map with seed 43.
etc.
First shuffled map out of i
maps gets passed a seed of 42, making a shuffled map with seed 42.
Second shuffled map out of i
maps also gets passed a seed of 42, making it identical to the first one.
etc.
So, because _shuffle_ra_dec()
gets called num_shuffles times, I believe it should need a new seed everytime, because it's independent of the map that was created before/after. This is how I've wrapped my brain around it, let me know if my reasoning makes sense to you. If not, let's discuss what's right further because I want this to be done properly!
@EdwardBerman Sayan is telling me that you're right, so I'll check that out and probably remove the extra code if it's redundant. Standby!
Okay! We can discuss in person, but yeah, I agree with Sayan.
Consider this minimal example:
import random
seed = 42
random.seed(seed)
data = [1, 2, 3, 4, 5]
shuffles = []
for _ in range(3):
shuffled_data = data[:]
random.shuffle(shuffled_data)
shuffles.append(shuffled_data)
for i, shuffle in enumerate(shuffles, 1):
print(f"Shuffle {i}: {shuffle}")
Every time I run this in some test.py
, I get the same shuffle applied to each of the 3 arrays, but it's not the exact same shuffle for each. I get
Shuffle 1: [4, 2, 3, 5, 1]
Shuffle 2: [4, 3, 1, 5, 2]
Shuffle 3: [4, 2, 3, 1, 5]
and then one more time as a sanity check
Shuffle 1: [4, 2, 3, 5, 1]
Shuffle 2: [4, 3, 1, 5, 2]
Shuffle 3: [4, 2, 3, 1, 5]
See how shuffle 1 matches with shuffle 1 for both runs, same with 2 and 3, but shuffles 1 2 and 3 are not the same.
Potentially also make the seed an input to the config, so a user can convince themself that the SnR is not a fluke from one realization of sampling random shuffles.
np.random.seed(42) should do the trick, there may be a more modern approach. For example, if you put this in a file,
and run it you will get the same result each time. Similarly, this approach will give you the same SnR map.