agnwinds / python

This is the repository for Python, the radiative transfer code used to winds in AGN and other syatems
GNU General Public License v3.0
26 stars 25 forks source link

Apparently different random numbers on different machines #239

Closed Higginbottom closed 6 years ago

Higginbottom commented 7 years ago

I've noticed subtle differences in simulations carried out on different machines. In particular heating rates computed directly from photon fields. In partcular: My Macbook = My iMac Endjinn = Iridis My macs != the Iridis.

Checking I note that the two classes of machine generate different photons.

If this is is just down to different random number lists then this is not necessarily a problem, but should be noted. If it is however down to something different in the code, then this is a worry.

Higginbottom commented 7 years ago

I've tested a simple random number code on my mac and endjinn.

There are two possible random number generators rand() and random() I have two compilers on my mac gcc (4.8.4) and clang (7.0.2) One compiler on enjin gcc (5.4.0)

both commands give the same set of random numbers on endjinn

the two compilers give the same results for each of the commands on the mac. But the two commands give different numbers, and neither set are the same as on endjinn.

Hmmmmm

OK, so I then got a 'platform independent' random number generator - the Mersenne Twister - and coded that up. It gives the same set of random numbers for all compilers and all machines.

On the web, people are not confident that rand() should give the same results cross-platform. So, the question is, do we implement a MT style random number generator in python to assure identical photons for the same seed on different machines. I should really test to se that with a MT random number generator I get identical results on various machines, to ensure that the differences I saw originally are indeed explained by the random number 'issue"

smangham commented 7 years ago

So whilst gsl has a Mersenne Twister generator it sounds like the SFMT implementation is better- twice as fast, and not as vulnerable to being poorly configured: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/speed.html There's libraries here, including ones aimed specifically at generating doubles that are faster. In particular, it has SIMD versions which seem to use 'Single Instruction, Multiple Data' capabilities added to CPUs in the early 2000s. It's probably safe for us to assume that most computers we're using can do them, but the MT implementation in GSL seems to be from ~2002 so probably does not use them. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/ Given how little time random number generation takes a ~4x speedup may not be worth the effort of including a new library, though.

Higginbottom commented 7 years ago

Here is my tiny c program, and some data files generated by doing

gcc --version > file.dat rand_test >> file.dat

I had to change all the filenames to txt to allow git to upload them......

rand_test.txt

nick_endjinn_gcc.txt nick_iridis_gcc.txt nicks_imac_clang.txt nicks_imac_gcc.txt nicks_macbook_pro_clang.txt nicks_macbook_pro_gcc.txt

It takes 11 seconds for my mac to generate 1 billion random numbers. I suspect that we will not be adding significant overheads to python by using a slower generator!

Higginbottom commented 6 years ago

Idly looking at this again, Its easy enough to set up a meursenne random number generator, but its a lot of work to crawl through the code changing all the places where we need random numbers. In particular, there are a lot of slightly different random number calls - do they all generate the same range?

e.g.

in cylind_var, we have a random number that looks like this:

(rand () / (MAXRAND - 0.5))

but also one that looks like this

(rand () / MAXRAND)

there are lots that look like

((rand () + 0.5) / MAXRAND)

each of these presumably is designed to exclude 1, or 0 or go from 0 to 1 inclusive etc. These all need logging and understanding. It would be a really good idea to write a little subroutine to make random numbers with some kind of human readable requirement....

In agn.c theres something that looks like

if (rand () > MAXRAND / 2)

which is presumably tossing a coin...

Anyway - lots of fiddling to do.....

kslong commented 6 years ago

@Higginbottom - So I thought we ought to continue this discussion here. So is the current thought to use the gsl mersenne twister generator, which seems sensible given that we are already using gsl?

Also, just to be certain, have we confirmed that these do produce the same random numbers on iridis and other machines? I agree with Nick is a very positive thing for ongoing development.

Next, looking at the discussion here, and also spending a bit of time on it this AM. I agree that some of our choices of boundaries, adding offsets are at least unexplained. My suspicion is that some are just historical, while others are needed. If we are going to update, and I think we are I would like us to commit to understanding this.

Next question. If I understand the gsl documentation, the actual choice of random number generator is/can be made using environment variables. Is this what we want?

Finally, do we want to use the native gsl calls, or do we intend to use a wrapper, that is create a routine, say, get_random, that encapsulates the gsl calls. The pro is that it would make the code more readable. The con as I see it is that ther are various gsl calls already that have options, and we might want to use those. There are also routines that generate non-uniform distributions, that presumable we would want to use.

The comment is related to #341

Except for using this as an opportunity to understand the offset stuff, I don't have strong opinions about this.

Higginbottom commented 6 years ago

I think having a wrapper would be good - apart from anything it means we only have to change one bit of code to make modifications. I'm sure that the different calls are really just to either avoid zeros or ones. But I totally agree that we should go through each one and see what we really want make sure that we do it correctly and consistently. Of course, having one wrapper subroutine would make this easier. I'll double check today that the same random numbers are made on all the system I have access to. I'm unsure as to wether it is best to set the generator type inside python, or use enviorment variables. I guess the latter is a bit more risky - we would want to hard-wire the environment variables in the makefile rather than get users to set them themselves....

Higginbottom commented 6 years ago

I've tested this little code on Iridis, My MacBook, My iMac and Endjinn. You get the same set of random numbers on each for the meursenne twister. Give it a try if you like! Here is the c random.c.txt and here is a little compiler - just do sh shell.sh and assuming you have $PYTHON set up normally, it should compile - this is showing my ignorance of makefiles etc :-) shell.sh.txt This large file is a set of 100 000 random numbers. Lines with NEW use the gsl random number generator, lines with OLD use the old one. macbook.dat.txt

So, to see if my fix works, first run the file and stuck the output in a file

./random.exe > my_file.dat

then do

diff my_file mackbook.txt > grep NEW

there should be no output - all the gsl random numbers are idential

if you do

diff my_file mackbook.txt > grep OLD - then on a Mac I would predict you also get no output, but on a line box I'd expect 100 000 differences..

Cheers!!!

Higginbottom commented 6 years ago

So, thinking about coding up the random number generator. The first few instances I've seen (in bb.c) all would have originally ended up with numbers from 0 to 1 inclusive. Using the gsl_rng_uniform_pos code excludes 0 and 1 (basically by internally rejecting 0 or randmax if they turn up). So the minimum would be 1/(2^32-1) = 2.32e-10 and the max would be 1-2.32e-10. So very close to 0 and 1!! Sometimes we want -0.5 to 0.5 - but this is only when we want to decide if a photon is emitted above of below the disk. It seems that having our little wrapper have options would slow down execution - there will be an if test every time you call a number. So I propose making the wrapper ultra simple just:

double xrand() { double num num = gsl_rng_uniform_pos(rng); return(num); } so all it does is make (0-1) and then keep the -0.5 in the local code to convert to -0.5 to 0.5

otherwise you'd end up with if statements, or if you call with min max you'd have to do something much more complicated.....

jhmatthews commented 6 years ago

Hi Nick. We could have an additional function called flip_coin() or similar, which might make things more explicit. I guess I would favour calling the routine random_number() or something, rather than xrand() but no biggie.

I think all you need for min and max is

double xrand(double min, double max)
{
double num = gsl_rng_uniform_pos(ring);
double x = min + ((max - min) * num); 
return(x);
}

but again not overly important.

kslong commented 6 years ago

I agree with James. This is what I had in mind, and is very readable. This is effectively what the program was doing previously. The optimizer should make it just as fast as if it were coded in line. Knox

From: James Matthews notifications@github.com Reply-To: agnwinds/python reply@reply.github.com Date: Wednesday, February 14, 2018 at 6:18 AM To: agnwinds/python python@noreply.github.com Cc: Knox Long long@stsci.edu, Comment comment@noreply.github.com Subject: Re: [agnwinds/python] Apparently different random numbers on different machines (#239)

Hi Nick. We could have an additional function called flip_coin() or similar, which might make things more explicit. I guess I would favour calling the routine random_number() or something, rather than xrand() but no biggie.

I think all you need for min and max is

double xrand(double min, double max)

{

double num = gsl_rng_uniform_pos(ring);

double x = min + ((max - min) * num);

return(x);

}

but again not overly important.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/agnwinds/python/issues/239#issuecomment-365590617, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACaeVQeaQN8-ZGtvVN5BljjmaKvZ8CtMks5tUs8ugaJpZM4L8uaa.

Higginbottom commented 6 years ago

Jolly good! Must have had a bit of a brain freeze. I like the idea of a flip coin - but it will want to return +ve or -ve for heads or tails - so just the same really as calling random_nmber(-0.5,0.5). Will stick with just one routine.... N

Higginbottom commented 6 years ago

Of course the min max approach fixes the 'odd' dfreq but of code. One can just do random_number(freqmin, freqmax).

Heres an odd one..

costheta = 2. * (rand () / MAXRAND) - 1.

is it just me, or is this just a very odd way of getting a random number between 0 and 1???

rand()/MAXRAND=[0,1]

times two=[0,2]

-1 = [0,1]

I fear I'm going mad, but I can't see why one wouldn't just have used rand()/MAXRAND....

so I'm using random_number(0.0,1.0)...

jhmatthews commented 6 years ago

that should be random_number(-1.0,1.0) which is what 2. * (rand () / MAXRAND) - 1 gives you.

Higginbottom commented 6 years ago

Oh my. 0-1 is -1 isn’t it. That’s it. I’m officially an idiot.

Higginbottom commented 6 years ago

Interestingly, initial implementation crashed the regression tests with a set fault. Tracked it down and it was because dvds_ave, called from define_wind, in turn called in python.c just before where we set up the random numbers makes a call to randvec. I guess rand() works even if you don't initialise the random number generator, since define_wind is called just before we set up the random number generator in python.c To fix it - we need to call init_random (my little routine that initialises the generator) before we call define_wind. I can't see any problem with this - it means shifting all the random number gubbins from line 774 to line 742....

Can anyone see a major issue with moving these lines up ?

Nick

jhmatthews commented 6 years ago

Seems sensible!

kslong commented 6 years ago

Sounds good to me. K

From: James Matthews notifications@github.com Reply-To: agnwinds/python reply@reply.github.com Date: Thursday, February 15, 2018 at 11:44 AM To: agnwinds/python python@noreply.github.com Cc: Knox Long long@stsci.edu, Comment comment@noreply.github.com Subject: Re: [agnwinds/python] Apparently different random numbers on different machines (#239)

Seems sensible!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/agnwinds/python/issues/239#issuecomment-366005861, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACaeVSBWIBk7ZPzqDafac5azC2VeOsEvks5tVGz5gaJpZM4L8uaa.

Higginbottom commented 6 years ago

the random branch now has a version that runs - all the regression tests pass. Cant see how to get travis to try and build it without doing a pull to dev - thats the next step I guess!

kslong commented 6 years ago

I’m happy to do this, if you are. Knox

From: Higginbottom notifications@github.com Reply-To: agnwinds/python reply@reply.github.com Date: Thursday, February 15, 2018 at 1:42 PM To: agnwinds/python python@noreply.github.com Cc: Knox Long long@stsci.edu, Comment comment@noreply.github.com Subject: Re: [agnwinds/python] Apparently different random numbers on different machines (#239)

the random branch now has a version that runs - all the regression tests pass. Cant see how to get travis to try and build it without doing a pull to dev - thats the next step I guess!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/agnwinds/python/issues/239#issuecomment-366039330, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACaeVZ6xj3I_QC_86va2D5vxO2Y38q-5ks5tVIiagaJpZM4L8uaa.

kslong commented 6 years ago

Should have said, I am happy for YOU to do this.

From: Higginbottom notifications@github.com Reply-To: agnwinds/python reply@reply.github.com Date: Thursday, February 15, 2018 at 1:42 PM To: agnwinds/python python@noreply.github.com Cc: Knox Long long@stsci.edu, Comment comment@noreply.github.com Subject: Re: [agnwinds/python] Apparently different random numbers on different machines (#239)

the random branch now has a version that runs - all the regression tests pass. Cant see how to get travis to try and build it without doing a pull to dev - thats the next step I guess!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/agnwinds/python/issues/239#issuecomment-366039330, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACaeVZ6xj3I_QC_86va2D5vxO2Y38q-5ks5tVIiagaJpZM4L8uaa.

kslong commented 6 years ago

@Higginbottom @jhmatthews Can this issue regarding random numbers be closed.

Higginbottom commented 6 years ago

closed by pull request #344