Closed lele394 closed 8 months ago
Thank you for this post.
Quick reply:
Adding a seed
argument might be OK provided there is a compelling case against using np.random.seed
itself.
Details:
The random number generators in rft1d are meant to be controlled using np.random.seed
like this:
import numpy as np
import rft1d
np.random.seed(0)
a = rft1d.randn1d(5, 101, 25)
b = rft1d.randn1d(5, 101, 25) # different from "a"
np.random.seed(0)
c = rft1d.randn1d(5, 101, 25) # same as "a"
Using an extra seed parameter as you suggest is OK, but I am a bit reluctant to do this because it suggests...
np.random.seed
(which it isn't)As a counter-example, imagine this:
np.random.seed(0)
x = np.random.randn(10)
np.random.seed(0)
a = rft1d.randn1d(5, 101, 25, seed=1234)
y = np.random.randn(10) # different from "x"
That is, the user will need to keep track of BOTH np.random.seed
calls AND seed
values submitted to rft1d.randn1d
, and the new seed
parameter could render external calls to np.random.seed
meaningless.
I think that this type of use-case suggests that RNG seeding should not be handled inside rft1d, but please feel free to provide an example or two of how a seed
keyword argument could be useful beyond simply calls to np.random.seed
Hello,
Thanks for your reply. I did try using numpy's random.seed()
without any success. The seed was always new when using randn1d. My use case is really about reproducibility of the results. It may be an issue on my part, and I haven't invested much time looking into it as it was just a prototype. I'll be back on it thursday and will take a better look at the way I implemented it.
My solution wasn't to add new parameters to randn1d, but to add a different file from random
that could be called something like SetRandom
. I basically just duplicated the file and modified it to my liking. Another solution could be to add a generate_seeded_sample
which would have a seed
parameter defaulted to None. And a check when generating samples, using that new function. That would solve the problem of keeping track of 2 seeds, while allowing the user to pass the seed he wants to use directly as an argument.
I actually wasn't aware that random generation was controlled using numpy. I did take a look in the wiki without finding any mention of it. I figured that out by looking through the code. No matter what way you'd rather go with, I'd recommend adding a small section describing how to introduce reproducibility when using the library.
Hope that helps, Léo
I'd recommend adding a small section describing how to introduce reproducibility when using the library.
I agree. I will add a reproducibility example to the online documentation. Before I do that I'd like to try to resolve this issue to ensure that rft1d covers intended reproducibility use cases...
I understand your idea regarding generate_seeded_sample
, but I don't think this solves the problem given the counter-example above unless generate_seeded_sample
gets then sets the RNG state. However, I am struggling to think of an example where this might be necessary.
Can you please provide a code snippet that demonstrates why generate_seeded_sample
might be useful beyond use of np.random.seed
?
Or do you think it would be sufficient to add a reproducibility example to the documentation?
Sorry, due to the project I'm working on, I can't share any code snippet right now. I can however give you an overall view of what I need that for. I'm basically creating training datasets for a neural network. The first reason why I needed this feature was to do a parameter sweep to build a "map" of the "zoom" of the impact of the parameter. Sadly, I need to set the seed before every run to have consistent result. Using that map I can then define parameters ranges for my program. I basically use multiple random gaussian fields the same way you'd use perlin noise octaves to create terrain height maps, but in 1D.
The second is that I'd like to study the impacts of other parameters of my program on the output of said neural networks while not changing the others (including the RGFs). Due to the size of the data we're talking about, let's just say that saving it to the hard drive is not really feasible, as it will also be shared with other people. I'd like to avoid having to send multiple gigs of data, and would rather ship a script able to generate my dataset.
My solution using generate_seeded_sample
is actually irrelevant now, I may not have implemented my seed the right way. As I said it was nothing more than a prototype I pieced together as a proof of concept. Adding a reproducibility example to the documentation will most likely be sufficient.
By the way I have spotted weird abnormalities when plotting the "map" I'm talking about above. For complete transparency I don't really understand all the maths behind randn1d. But I did spot an oddity, there's something like 3 "bands" when sweeping the smooth parameter, before it goes completely badonkers (see linked image). I'll open a separate issue when I'll get to it since it has nothing to do with reproducibility. plot axes are not to size and not in the linear scale. I can spot 3 bands though ( 0-150, 150-350, 350-550). Is that an expected behavior?
I don't quit understand what the horizontal and vertical axes represent in the map above so please do indeed open a separate issue with a description of the axes, and preferably also with a colormap.
Back to the random seed issue:
From your description it sounds like the problem can be solved just by calling np.random.seed
before each call to rft1d.random
functions, something like this:
for i in range(1000):
np.random.seed(i)
a = rft1d.randn1d(8, 101, 25)
Does this adequately describe your use case?
Does this give you the seeding control you need?
Does this adequately describe your use case?
Not quite, see below.
for i in range(1000):
np.random.seed(1234)
a = rft1d.randn1d(1, 1000, i)
I sweep the smoothing parameter, not the seed. It's not relevant to our issue though.
Does this give you the seeding control you need?
Yes that works. The error was on my part. That issue is solved on my end. I believe once you get a reproducibility example in the documentation, we'll be able to close that thread. Thanks a lot for your answers.
OK, thank you for confirming!
I see what you mean now by sweeping the smoothness parameter. Although not directly related to this issue please note the following:
fwhm / field_size
) may not yield accurate fields. So if you use a field size of 1000
as in your example you should probably stop at around i=500
pad=False
like this:a = rft1d.randn1d(1, 1000, 300, pad=False)
Setting pad=False
like this achieves two things:
a = rft1d.randn1d(1, 1000, 300)
b = np.hstack([a,a,a,a]) # repeating random field
np.random.seed(123)
a0 = rft1d.randn1d(1, 1000, 50)
np.random.seed(123)
a1 = rft1d.randn1d(1, 1000, 300) # not directly comparable to "a0" because fields are padded
np.random.seed(123)
b0 = rft1d.randn1d(1, 1000, 50, pad=False)
np.random.seed(123)
b1 = rft1d.randn1d(1, 1000, 300, pad=False) # "b1" and "b0" are more directly comparable than are "a1" and "a0"
I have added a reproducibility example here and made this example accessible from the main examples menu. Do you think this is now clearer?
Yes, what you added is exactly what I was missing. Reproducibility and the right way to do so is now correctly documented, and should make it clear for future users.
Looks like we're done here, I'll close the issue.
Hello,
I've recently had to use the rft1d library to construct gaussian random fields. I needed to perform a sweep on the smoothing parameter to build a heatmap of my fields for different values. Since I couldn't pass a seed to the generator I couldn't do it. I did however modify the library by duplicating the
random
file and modifying both generators. I added a seed parameter to the constructor, and simply set that seed at the beginning of eachgenerate_sample()
.Modified generator
__init__
.Modified start of
generate_sample()
Are you interested in me pushing that change?