Mindwerks / worldengine

World generator using simulation of plates, rain shadow, erosion, etc.
MIT License
988 stars 129 forks source link

random module gives varying output for different Python versions #145

Closed tcld closed 8 years ago

tcld commented 8 years ago

As stated here https://github.com/Mindwerks/worldengine/pull/138#issuecomment-149794285 there seems to be a change in the random number generator between Python 2 and Python 3.

Starting from the same seed, the outputs between the two versions differ.

Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
>>> import random
>>> random.seed(0)
>>> for x in range(0, 10):
...     random.randint(0, 1000000)
885440
403958
794772
933488
441001
42450
271493
536110
509532
424604
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import random
>>> random.seed(0)
>>> for x in range(0, 10):
...     random.randint(0, 1000000)
844422
757955
420572
258917
511275
404934
783799
303313
476597
583382

This might be due to the different versions of gcc that have been used to compile the Python versions (I don't know how to test that right now) but since the maps here https://github.com/Mindwerks/worldengine/pull/138#issuecomment-149794285 are extremely similar I would expect that platec isn't influenced by the whole problem.

EDIT Here the pure code, if you want to quickly redo this:

import random
random.seed(0)
for x in range(0, 10):
    random.randint(0, 1000000)
import random
random.seed(0, version = 2)
for x in range(0, 10):
    random.randint(0, 1000000)
ftomassetti commented 8 years ago

Let's start looking for alternatives:

tcld commented 8 years ago

There is a problem left: I looked at the implementation of the noise module, and that one makes "hardcoded" use of random, too.

tcld commented 8 years ago

Overall I think this is not a big problem, just one that should be fixed. I wanted to flesh out this https://github.com/Mindwerks/worldengine/pull/133 anyway to find a completely deterministic way of generating maps, even if the order of the steps differed. So I will do this all in one swoop, then. Might be done within a couple hours or a couple days, I don't know yet how much time I can put into it.

ftomassetti commented 8 years ago

Ok, but that is used during world generation, right? While our tests use pre-generated worlds and then just generate maps from those worlds, correct? Anyway it is almost impossible to get the same worlds given the same seed on different platforms because of the C++ code of plate-tectonics. We tried by adopring a spefic library for random generation, to use fixed size types (an int or a long can have different sizes on different architectures) but still the results are not the same because floating point operations produce slightly different results depending on the optimization done by the compiler. So basically we gave up on that. We should just get the same maps given the same world and for that we should use just python code and I was thinking the noise module was not involved in that phase.

ftomassetti commented 8 years ago

(closed by mistake, sorry)

tcld commented 8 years ago

Right, noise is only used during generation (I think). But referring to my last message, it is still kind of a problem since there is no way to replace the RNG that is used by the noise-module.

You are right. We can most likely fix the Python side. As for C++, did you try implementing your own RNG?

ftomassetti commented 8 years ago

Yes

On Wed, Oct 21, 2015 at 8:53 AM, tcld notifications@github.com wrote:

Right, noise is only used during generation (I think). But referring to my last message, it is still kind of a problem since there is no way to replace the RNG that is used by the noise-module.

You are right. We can most likely fix the Python side. As for C++, did you try implementing your own RNG?

— Reply to this email directly or view it on GitHub https://github.com/Mindwerks/worldengine/issues/145#issuecomment-149798428 .

Website at http://tomassetti.me http://www.federico-tomassetti.it GitHub https://github.com/ftomassetti

tcld commented 8 years ago

Oh, well. Then let's forget about that part permanently.^^

tcld commented 8 years ago

I just noticed this: https://hg.python.org/cpython/file/3.4/Lib/random.py

* The random() method is implemented in C, executes in a single Python step,

  and is, therefore, threadsafe.

This might actually be difficult without getting some kind of high-level RNG going. There is probably something like that available somewhere, but I guess it is going to be quite slow.

psi29a commented 8 years ago

Python's default RNG is supposed to be predictable and uses Mersenne-Twister by default.

Keep in mind that the rng is seeded by time by default when imported. So if we want to be predictable, we need to seed the rng with a timestamp and keep this timestamp in our world format.

tcld commented 8 years ago

Read the first post in here. The problem is that the seed does not make for predictable output.

psi29a commented 8 years ago

And if you did the same thing twice in the same Python environment?

psi29a commented 8 years ago
Python 2.7.9 (default, Apr  2 2015, 15:33:21) 
Type "copyright", "credits" or "license" for more information.

IPython 2.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import random

In [2]: random.seed(0)

In [3]: random.randint(0, 1000000)
Out[3]: 844422

In [4]: random.randint(0, 1000000)
Out[4]: 757955

In [5]: random.randint(0, 1000000)
Out[5]: 420572

In [6]: random.randint(0, 1000000)
Out[6]: 258917

Looks like python2 is consistent across all platforms.

psi29a commented 8 years ago

I think I've solved our problem: set the version in seed :)

random.seed(a=None, version=2)
Initialize the random number generator.

If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).

If a is an int, it is used directly.

With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used. With version 1, the hash() of a is used instead.

Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.
tcld commented 8 years ago

I updated the first post so the code I used can easily be copied and pasted.

@psi29a Oh, great! I forgot about that, even though I read through the code while being on the train yesterday. Now we have to use version 1? Or just tell all Python versions to use version 2?

EDIT: Python 2's seed() doesn't support the version argument for me.

EDIT2: And Python 3's output doesn't really care about the argument.

tcld commented 8 years ago

Oh, maybe this doesn't solve anything after all. Seems like the version argument is only effective when providing strings etc. as seeds.

psi29a commented 8 years ago

Sorry, I was mistaken... here is the real reason: https://groups.google.com/forum/#!topic/comp.lang.python/KwALjKjF6Y4

tcld commented 8 years ago

That means that under Python 2 you still call seed(0)? In one of your posts above your output was 844422, though. EDIT: Ok, I'll read. :)

psi29a commented 8 years ago

"The integer methods in the random module now do a better job of producing uniform distributions. Previously, they computed selections with int(n*random()) which had a slight bias whenever n was not a power of two. Now, multiple selections are made from a range up to the next power of two and a selection is kept only when it falls within the range 0 <= x < n. The functions and methods affected are randrange(), randint(), choice(), shuffle() and sample()."

psi29a commented 8 years ago

https://docs.python.org/3/library/random.html#notes-on-reproducibility <-- uses weasels words here, so we can't trust it across python versions.

tcld commented 8 years ago

According to that we could use random.random() and always be safe. But I will take a look around for a different RNG at some point.

EDIT: Indeed, random()'s output does not differ. Maybe a fix to the ancient map-drawing would be a quick solution to the failing tests, then. Fall back to random() and the somehow extrapolate whatever is necessary.

psi29a commented 8 years ago

Yes, random.random() is safe across platforms.

I think we should just implement our own randrang instead. I say we take python3's version and just embed it in our common.py

When we drop python2 support in the future, we just drop our version of randrang and use the default.

tcld commented 8 years ago

Sounds good. I will start doing that once I feel safe continuing to build on top of my fork again.

tcld commented 8 years ago

I just had a bit of a read. There are several solutions to the problem:

  1. Use numpy.random.
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.random.seed(0)
>>> for i in range(0, 10):
...     numpy.random.randint(1000000)
... 
985772
305711
435829
117952
963395
152315
882371
359783
304137
122579
Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.random.seed(0)
>>> for i in range(0, 10):
...     numpy.random.randint(1000000)
... 
985772
305711
435829
117952
963395
152315
882371
359783
304137
122579

To get the noise-module in line, we could copy the code from here: https://github.com/caseman/noise/blob/master/perlin.py and replace its random-module with whatever we choose to use. (I don't know how the licensing for sth. like that works, though.)

Another thing to think about is that we could sub-class one of the RNGs and then provide each part of worldengines generation with its own RNG. Throw some changes to the noise-generator in there and the output would stay reproducible even if the order in which the steps during generation are called were changed (or a step repeated, for example, which might happen when using a GUI).

I kind of like approach 2 and the all-inclusive way of handling this.

psi29a commented 8 years ago

I'm honestly pro numpy. The problem was in randint being differently implemented in python2 and python3. Looks like numpy works on both and I'm happy with that. It is one less maintenance problem for us.

As for caseman's noise, we can fix it ourselves then ask for a PR. We've worked with him in the past.

tcld commented 8 years ago

Oh, that is great news. numpy certainly is a very quick and simple fix. Might even speed some things up.

There is only a single call to randint() in the source of noise, so an additional parameter for a custom function would already be enough of a change. If that is something the rest of the world would not oppose to.^^

EDIT: I hope the noise-package I linked to is actually the one used by worldengine.

tcld commented 8 years ago

Casey is quick. I asked him if he was willing to accept a PR (or just do it himself) and how he would like it to be done and he replied with two suggestions. I will pick one and work this out tomorrow. Then, once the numpy-stuff is all done (including the second PR, which will probably take a while since I got entangled in changing the worldengine.draw-module quite a bit), I could create a small fix that replaces all random-calls with numpy.random-calls.

tcld commented 8 years ago

I submitted my PR to caseman/noise and, while working on it, I noticed that worldengine is using snoise2, not noise2. As far as I understand, that is a C-method inside the noise-module that comes without any randomization at all (somebody correct me if I'm wrong). So that part kind of fixed itself.^^ But if we ever decided to use noise2 (and my PR was merged) we could easily do so without being afraid of the RNG. :P

That leaves the calls to randint() etc. within worldengine to be replaced by something more reproducible. I might branch that off of https://github.com/Mindwerks/worldengine/pull/138 or even master so it can be merged without checking all my other code. I expect this change to be trivial. (Famous last words.^^)

tcld commented 8 years ago

I hope the problem will be fixed with this https://github.com/Mindwerks/worldengine/pull/146

psi29a commented 8 years ago

Yes, the snoise2 passes through to C which isn't random. :)

tcld commented 8 years ago

I will consider this problem solved and close the issue.

earthbound19 commented 6 years ago

I have not seen numpy random give the same output given the same input in my own recent tests on another project. You folks might want to test your PRND on different platforms with the same input.

[EDIT:] I see in a referenced pull requests that you folks did do tests? So, you are getting the same output given the same input on different platforms?