Closed tcld closed 9 years ago
Let's start looking for alternatives:
There is a problem left: I looked at the implementation of the noise module, and that one makes "hardcoded" use of random, too.
Overall I think this is not a big problem, just one that should be fixed. I wanted to flesh out this https://github.com/Mindwerks/worldengine/pull/133 anyway to find a completely deterministic way of generating maps, even if the order of the steps differed. So I will do this all in one swoop, then. Might be done within a couple hours or a couple days, I don't know yet how much time I can put into it.
Ok, but that is used during world generation, right? While our tests use pre-generated worlds and then just generate maps from those worlds, correct? Anyway it is almost impossible to get the same worlds given the same seed on different platforms because of the C++ code of plate-tectonics. We tried by adopring a spefic library for random generation, to use fixed size types (an int or a long can have different sizes on different architectures) but still the results are not the same because floating point operations produce slightly different results depending on the optimization done by the compiler. So basically we gave up on that. We should just get the same maps given the same world and for that we should use just python code and I was thinking the noise module was not involved in that phase.
(closed by mistake, sorry)
Right, noise is only used during generation (I think). But referring to my last message, it is still kind of a problem since there is no way to replace the RNG that is used by the noise-module.
You are right. We can most likely fix the Python side. As for C++, did you try implementing your own RNG?
Yes
On Wed, Oct 21, 2015 at 8:53 AM, tcld notifications@github.com wrote:
Right, noise is only used during generation (I think). But referring to my last message, it is still kind of a problem since there is no way to replace the RNG that is used by the noise-module.
You are right. We can most likely fix the Python side. As for C++, did you try implementing your own RNG?
— Reply to this email directly or view it on GitHub https://github.com/Mindwerks/worldengine/issues/145#issuecomment-149798428 .
Website at http://tomassetti.me http://www.federico-tomassetti.it GitHub https://github.com/ftomassetti
Oh, well. Then let's forget about that part permanently.^^
I just noticed this: https://hg.python.org/cpython/file/3.4/Lib/random.py
* The random() method is implemented in C, executes in a single Python step,
and is, therefore, threadsafe.
This might actually be difficult without getting some kind of high-level RNG going. There is probably something like that available somewhere, but I guess it is going to be quite slow.
Python's default RNG is supposed to be predictable and uses Mersenne-Twister by default.
Keep in mind that the rng is seeded by time by default when imported. So if we want to be predictable, we need to seed the rng with a timestamp and keep this timestamp in our world format.
Read the first post in here. The problem is that the seed does not make for predictable output.
And if you did the same thing twice in the same Python environment?
Python 2.7.9 (default, Apr 2 2015, 15:33:21)
Type "copyright", "credits" or "license" for more information.
IPython 2.3.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import random
In [2]: random.seed(0)
In [3]: random.randint(0, 1000000)
Out[3]: 844422
In [4]: random.randint(0, 1000000)
Out[4]: 757955
In [5]: random.randint(0, 1000000)
Out[5]: 420572
In [6]: random.randint(0, 1000000)
Out[6]: 258917
Looks like python2 is consistent across all platforms.
I think I've solved our problem: set the version in seed :)
random.seed(a=None, version=2)
Initialize the random number generator.
If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).
If a is an int, it is used directly.
With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used. With version 1, the hash() of a is used instead.
Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.
I updated the first post so the code I used can easily be copied and pasted.
@psi29a Oh, great! I forgot about that, even though I read through the code while being on the train yesterday. Now we have to use version 1? Or just tell all Python versions to use version 2?
EDIT: Python 2's seed() doesn't support the version argument for me.
EDIT2: And Python 3's output doesn't really care about the argument.
Oh, maybe this doesn't solve anything after all. Seems like the version argument is only effective when providing strings etc. as seeds.
Sorry, I was mistaken... here is the real reason: https://groups.google.com/forum/#!topic/comp.lang.python/KwALjKjF6Y4
That means that under Python 2 you still call seed(0)
? In one of your posts above your output was 844422, though.
EDIT: Ok, I'll read. :)
"The integer methods in the random module now do a better job of producing uniform distributions. Previously, they computed selections with int(n*random()) which had a slight bias whenever n was not a power of two. Now, multiple selections are made from a range up to the next power of two and a selection is kept only when it falls within the range 0 <= x < n. The functions and methods affected are randrange(), randint(), choice(), shuffle() and sample()."
https://docs.python.org/3/library/random.html#notes-on-reproducibility <-- uses weasels words here, so we can't trust it across python versions.
According to that we could use random.random() and always be safe. But I will take a look around for a different RNG at some point.
EDIT: Indeed, random()'s output does not differ. Maybe a fix to the ancient map-drawing would be a quick solution to the failing tests, then. Fall back to random() and the somehow extrapolate whatever is necessary.
Yes, random.random() is safe across platforms.
I think we should just implement our own randrang instead. I say we take python3's version and just embed it in our common.py
When we drop python2 support in the future, we just drop our version of randrang and use the default.
Sounds good. I will start doing that once I feel safe continuing to build on top of my fork again.
I just had a bit of a read. There are several solutions to the problem:
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.random.seed(0)
>>> for i in range(0, 10):
... numpy.random.randint(1000000)
...
985772
305711
435829
117952
963395
152315
882371
359783
304137
122579
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.random.seed(0)
>>> for i in range(0, 10):
... numpy.random.randint(1000000)
...
985772
305711
435829
117952
963395
152315
882371
359783
304137
122579
To get the noise-module in line, we could copy the code from here: https://github.com/caseman/noise/blob/master/perlin.py and replace its random-module with whatever we choose to use. (I don't know how the licensing for sth. like that works, though.)
Another thing to think about is that we could sub-class one of the RNGs and then provide each part of worldengines generation with its own RNG. Throw some changes to the noise-generator in there and the output would stay reproducible even if the order in which the steps during generation are called were changed (or a step repeated, for example, which might happen when using a GUI).
I kind of like approach 2 and the all-inclusive way of handling this.
I'm honestly pro numpy. The problem was in randint being differently implemented in python2 and python3. Looks like numpy works on both and I'm happy with that. It is one less maintenance problem for us.
As for caseman's noise, we can fix it ourselves then ask for a PR. We've worked with him in the past.
Oh, that is great news. numpy certainly is a very quick and simple fix. Might even speed some things up.
There is only a single call to randint() in the source of noise, so an additional parameter for a custom function would already be enough of a change. If that is something the rest of the world would not oppose to.^^
EDIT: I hope the noise-package I linked to is actually the one used by worldengine.
Casey is quick. I asked him if he was willing to accept a PR (or just do it himself) and how he would like it to be done and he replied with two suggestions. I will pick one and work this out tomorrow. Then, once the numpy-stuff is all done (including the second PR, which will probably take a while since I got entangled in changing the worldengine.draw-module quite a bit), I could create a small fix that replaces all random-calls with numpy.random-calls.
I submitted my PR to caseman/noise and, while working on it, I noticed that worldengine is using snoise2, not noise2. As far as I understand, that is a C-method inside the noise-module that comes without any randomization at all (somebody correct me if I'm wrong). So that part kind of fixed itself.^^ But if we ever decided to use noise2 (and my PR was merged) we could easily do so without being afraid of the RNG. :P
That leaves the calls to randint() etc. within worldengine to be replaced by something more reproducible. I might branch that off of https://github.com/Mindwerks/worldengine/pull/138 or even master so it can be merged without checking all my other code. I expect this change to be trivial. (Famous last words.^^)
I hope the problem will be fixed with this https://github.com/Mindwerks/worldengine/pull/146
Yes, the snoise2 passes through to C which isn't random. :)
I will consider this problem solved and close the issue.
I have not seen numpy random give the same output given the same input in my own recent tests on another project. You folks might want to test your PRND on different platforms with the same input.
[EDIT:] I see in a referenced pull requests that you folks did do tests? So, you are getting the same output given the same input on different platforms?
As stated here https://github.com/Mindwerks/worldengine/pull/138#issuecomment-149794285 there seems to be a change in the random number generator between Python 2 and Python 3.
Starting from the same seed, the outputs between the two versions differ.
This might be due to the different versions of gcc that have been used to compile the Python versions (I don't know how to test that right now) but since the maps here https://github.com/Mindwerks/worldengine/pull/138#issuecomment-149794285 are extremely similar I would expect that platec isn't influenced by the whole problem.
EDIT Here the pure code, if you want to quickly redo this: