argiopetech / base

Bayesian Analysis for Stellar Evolution
http://webfac.db.erau.edu/~vonhippt/base9/
11 stars 4 forks source link

Add MT warmup #38

Closed argiopetech closed 10 years ago

argiopetech commented 10 years ago

Per Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

tedvh commented 10 years ago

good thought. Would there still be a way to get the same result twice if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

    Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

argiopetech commented 10 years ago

The proposed method should still give deterministic results as long as we don't change hash functions or change the number of warmup iterations. I intend these both to be hard-coded. On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

— Reply to this email directly or view it on GitHubhttps://github.com/argiopetech/base/issues/38#issuecomment-21413135 .

tedvh commented 10 years ago

OK. And is there a way to start a run off differently in case one wants to do that?

On 7/23/13 11:25 AM, Elliot Robinson wrote:

The proposed method should still give deterministic results as long as we don't change hash functions or change the number of warmup iterations. I intend these both to be hard-coded. On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

— Reply to this email directly or view it on GitHubhttps://github.com/argiopetech/base/issues/38#issuecomment-21413135 .

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38#issuecomment-21422050.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

argiopetech commented 10 years ago

The current --seed CLI flag and the seed: YAML field will remain as they are, they'll just be mapped to a (hopefully) more complex number internally.


Elliot Robinson Email: elliot.robinson@argiopetech.com Phone: (321) 252-9660

On Tue, Jul 23, 2013 at 11:52 AM, tedvh notifications@github.com wrote:

OK. And is there a way to start a run off differently in case one wants to do that?

On 7/23/13 11:25 AM, Elliot Robinson wrote:

The proposed method should still give deterministic results as long as we don't change hash functions or change the number of warmup iterations. I intend these both to be hard-coded. On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

— Reply to this email directly or view it on GitHub< https://github.com/argiopetech/base/issues/38#issuecomment-21413135>

.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38#issuecomment-21422050.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

— Reply to this email directly or view it on GitHubhttps://github.com/argiopetech/base/issues/38#issuecomment-21424164 .

tedvh commented 10 years ago

ah, gotcha.

On 7/23/13 12:10 PM, Elliot Robinson wrote:

The current --seed CLI flag and the seed: YAML field will remain as they are, they'll just be mapped to a (hopefully) more complex number internally.


Elliot Robinson Email: elliot.robinson@argiopetech.com Phone: (321) 252-9660

On Tue, Jul 23, 2013 at 11:52 AM, tedvh notifications@github.com wrote:

OK. And is there a way to start a run off differently in case one wants to do that?

On 7/23/13 11:25 AM, Elliot Robinson wrote:

The proposed method should still give deterministic results as long as we don't change hash functions or change the number of warmup iterations. I intend these both to be hard-coded. On Jul 23, 2013 9:28 AM, "tedvh" notifications@github.com wrote:

good thought. Would there still be a way to get the same result twice if one wanted to? This could be useful for testing purposes.

On 7/23/13 2:00 AM, Elliot Robinson wrote:

Per Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf, the Mersenne Twister's randomness properties can suffer if seeded with a simple seed (from the binary viewpoint, the more zeroes there are starting from the MSB moving toward the LSB, the simpler the value).

Proposed remedy

Instead of using user-provided seeds directly, hash values prior to seeding. Additionally, generate and discard between several hundred and several thousand values to "warm up" the PRNG.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

— Reply to this email directly or view it on GitHub< https://github.com/argiopetech/base/issues/38#issuecomment-21413135>

.

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38#issuecomment-21422050.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751

— Reply to this email directly or view it on GitHubhttps://github.com/argiopetech/base/issues/38#issuecomment-21424164 .

— Reply to this email directly or view it on GitHub https://github.com/argiopetech/base/issues/38#issuecomment-21425506.

Ted von Hippel

Department of Physical Sciences Embry-Riddle Aeronautical University 600 S. Clyde Morris Boulevard Daytona Beach, FL 32114-3900 386-226-7751