Perl / perl5

đŸȘ The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

rand() on Windows only uses 15 bits of entropy #12620

Closed p5pRT closed 10 years ago

p5pRT commented 11 years ago

Migrated from rt.perl.org#115928 (status was 'resolved')

Searchable as RT115928$

p5pRT commented 11 years ago

From Perl@ResonatorSoft.org

Created by Perl@ResonatorSoft.org

The Win32 version of Perl doesn't seem to generate random numbers very well at all\, especially in comparison to the Linux version.

Here's an example script detailing the problem​:

my %nums; my $cnt = 0; foreach my $i (1 .. 1_000_000) {   my $num = rand;   $cnt++ if ($nums{$num});   $nums{$num} = 1; } print "$cnt out of 1\,000\,000\n";

The purpose of the script is to reveal how many duplicate numbers were generated out of a random sample of one million total numbers. For Windows\, this will output "967232 out of 1\,000\,000" every time. In other words\, there are only 32768 possible combinations of "between 0 and 1" floats that rand() will generate.

If I test this same script on a Debian Linux box\, I actually get no dupes\, meaning that it has a MUCH higher entropy.

Perl Info ``` Flags: category=core severity=medium Site configuration information for perl 5.16.0: Configured by strawberry-perl at Mon May 21 22:08:30 2012. Summary of my perl5 (revision 5 version 16 subversion 0) configuration: Platform: osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread uname='Win32 strawberry-perl 5.16.0.1 #1 Mon May 21 22:07:30 2012 i386' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='gcc', ccflags =' -s -O2 -DWIN32 -DPERL_TEXTMODE_SCRIPTS -DPERL_IMPLICIT_CONTEXT -DPERL_IMPLICIT_SYS -fno-strict-aliasing -mms-bitfields', optimize='-s -O2', cppflags='-DWIN32' ccversion='', gccversion='4.6.3', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='long long', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='g++', ldflags ='-s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"' libpth=C:\strawberry\c\lib C:\strawberry\c\i686-w64-mingw32\lib libs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 perllibs=-lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 libc=, so=dll, useshrplib=true, libperl=libperl516.a gnulibc_version='' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-mdll -s -L"C:\strawberry\perl\lib\CORE" -L"C:\strawberry\c\lib"' Locally applied patches: @INC for perl 5.16.0: C:/STRAWBERRY/perl/site/lib/MSWin32-x86-multi-thread C:/STRAWBERRY/perl/site/lib C:/STRAWBERRY/perl/vendor/lib C:/STRAWBERRY/perl/lib . Environment for perl 5.16.0: HOME (unset) LANG (unset) LANGUAGE (unset) LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=C:\Windows\SYSTEM32;C:\Windows;C:\Windows\SYSTEM32\WBEM;C:\Windows\SYSTEM32\WINDOWSPOWERSHELL\V1.0\;C:\STRAWBERRY\C\BIN;C:\STRAWBERRY\PERL\SITE\BIN;C:\STRAWBERRY\PERL\BIN;C:\CYGWIN\BIN;C:\CYGWIN\USR\BIN;C:\Program Files\TortoiseSVN\bin;C:\Program Files\Graphviz 2.28\bin;C:\Program Files\TortoiseGit\bin;c:\Program Files\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;c:\Program Files\Microsoft SQL Server\100\Tools\Binn\;c:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;c:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files\Nmap PERL_BADLANG (unset) SHELL (unset) ```
p5pRT commented 11 years ago

From @wchristian

I replicated the behavior with ActivePerl v5.12.4 (MSWin32-x86-multi- thread).

p5pRT commented 11 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 11 years ago

From @Leont

On Tue\, Nov 27\, 2012 at 3​:59 PM\, Perl@​ResonatorSoft.org \perlbug\-followup@​perl\.org wrote​:

The Win32 version of Perl doesn't seem to generate random numbers very well at all\, especially in comparison to the Linux version.

Here's an example script detailing the problem​:

my %nums; my $cnt = 0; foreach my $i (1 .. 1_000_000) { my $num = rand; $cnt++ if ($nums{$num}); $nums{$num} = 1; } print "$cnt out of 1\,000\,000\n";

The purpose of the script is to reveal how many duplicate numbers were generated out of a random sample of one million total numbers. For Windows\, this will output "967232 out of 1\,000\,000" every time. In other words\, there are only 32768 possible combinations of "between 0 and 1" floats that rand() will generate.

If I test this same script on a Debian Linux box\, I actually get no dupes\, meaning that it has a MUCH higher entropy.

Perl already chooses between a bunch of random number sources. If there's something obviously better available we could use that.

That said\, I don't really think this is really severity=medium. rand() doesn't make any promises on strength\, and if you need something stronger there are modules on CPAN that can provide just that.

Leon

p5pRT commented 11 years ago

From @wchristian

On Tue Nov 27 07​:19​:53 2012\, LeonT wrote​:

If there's something obviously better available we could use that.

On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

http​://msdn.microsoft.com/en-us/library/aa387694(v=vs.80).aspx http​://blogs.msdn.com/b/michael_howard/archive/2005/01/14/353379.aspx

p5pRT commented 11 years ago

From @Leont

On Tue\, Nov 27\, 2012 at 4​:40 PM\, Christian Walde via RT \perlbug\-followup@​perl\.org wrote​:

If there's something obviously better available we could use that.

On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

http​://msdn.microsoft.com/en-us/library/aa387694(v=vs.80).aspx http​://blogs.msdn.com/b/michael_howard/archive/2005/01/14/353379.aspx

Looks workable\, though it's linking is horrible even by Windows standards.

Leon

p5pRT commented 11 years ago

From BitCard@ResonatorSoft.org

On Tue Nov 27 07​:49​:28 2012\, LeonT wrote​:

Looks workable\, though it's linking is horrible even by Windows standards.

BTW\, what does Perl currently use for random functionality on Windows?
Honestly\, I'm surprised that Windows library routines have that few combinations.

p5pRT commented 11 years ago

From @wchristian

On Tue Nov 27 07​:49​:28 2012\, LeonT wrote​:

On Tue\, Nov 27\, 2012 at 4​:40 PM\, Christian Walde via RT \perlbug\-followup@​perl\.org wrote​:

If there's something obviously better available we could use that.

On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

http​://msdn.microsoft.com/en-us/library/aa387694(v=vs.80).aspx http​://blogs.msdn.com/b/michael_howard/ archive/2005/01/14/353379.aspx

Looks workable\, though it's linking is horrible even by Windows standards.

Well\, as the doc says\, if you don't mind a bit of extra memory use\, you can use CryptGenRandom [ http​://msdn.microsoft.com/en-us/library/ aa379942(v=vs.85).aspx ] as a source. Am i remembering that right that Windows XP is the oldest Windows supported by Perl? Because if so\, then that should be no issue.

p5pRT commented 11 years ago

From @wchristian

On Tue Nov 27 07​:53​:01 2012\, SineSwiper wrote​:

On Tue Nov 27 07​:49​:28 2012\, LeonT wrote​:

Looks workable\, though it's linking is horrible even by Windows standards.

BTW\, what does Perl currently use for random functionality on Windows?
Honestly\, I'm surprised that Windows library routines have that few combinations.

Windows currently supplies a rand() in the stdlib that uses 15 bits and is only meant as a sort of placeholder\, since you are meant to select your own random generator if you actually care about it. Looks like Perl simply uses that straight-up.

p5pRT commented 11 years ago

From @craigberry

On Tue\, Nov 27\, 2012 at 10​:11 AM\, Christian Walde via RT \perlbug\-followup@​perl\.org wrote​:

On Tue Nov 27 07​:53​:01 2012\, SineSwiper wrote​:

On Tue Nov 27 07​:49​:28 2012\, LeonT wrote​:

Looks workable\, though it's linking is horrible even by Windows standards.

BTW\, what does Perl currently use for random functionality on Windows? Honestly\, I'm surprised that Windows library routines have that few combinations.

Windows currently supplies a rand() in the stdlib that uses 15 bits and is only meant as a sort of placeholder\, since you are meant to select your own random generator if you actually care about it. Looks like Perl simply uses that straight-up.

It might be as simple as changing the configuration code (e.g.\, in win32/config_H.vc and friends) that looks like​:

#define Drand01() (rand()/(double)((unsigned)1\<\<RANDBITS)) /**/ #define Rand_seed_t unsigned /**/ #define seedDrand01(x) srand((Rand_seed_t)x) /**/ #define RANDBITS 15 /**/

to point to something more capable.

p5pRT commented 11 years ago

From @kmx

On 27.11.2012 16​:40\, Christian Walde via RT wrote​:

On Tue Nov 27 07​:19​:53 2012\, LeonT wrote​:

If there's something obviously better available we could use that. On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

http​://msdn.microsoft.com/en-us/library/aa387694(v=vs.80).aspx http​://blogs.msdn.com/b/michael_howard/archive/2005/01/14/353379.aspx

What about CryptGenRandom (available on WinXP+)

{   BYTE randomdata[256];   HCRYPTPROV hCryptProv = 0;   if (CryptAcquireContextW(&hCryptProv\, NULL\, NULL\, PROV_RSA_FULL\, CRYPT_VERIFYCONTEXT)) {   if (CryptGenRandom(hCryptProv\, sizeof(randomdata)\, randomdata)) {   /* success */   }   else {   /* failure */   }   CryptReleaseContext(hCryptProv\, 0);   } }

http​://msdn.microsoft.com/en-us/library/windows/desktop/aa379942%28v=vs.85%29.aspx

-- kmx

p5pRT commented 11 years ago

From @demerphq

On 27 November 2012 20​:43\, kmx \kmx@&#8203;atlas\.cz wrote​:

On 27.11.2012 16​:40\, Christian Walde via RT wrote​:

On Tue Nov 27 07​:19​:53 2012\, LeonT wrote​:

If there's something obviously better available we could use that.

On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

http​://msdn.microsoft.com/en-us/library/aa387694(v=vs.80).aspx http​://blogs.msdn.com/b/michael_howard/archive/2005/01/14/353379.aspx

What about CryptGenRandom (available on WinXP+)

{ BYTE randomdata[256]; HCRYPTPROV hCryptProv = 0; if (CryptAcquireContextW(&hCryptProv\, NULL\, NULL\, PROV_RSA_FULL\, CRYPT_VERIFYCONTEXT)) { if (CryptGenRandom(hCryptProv\, sizeof(randomdata)\, randomdata)) { /* success */ } else { /* failure */ } CryptReleaseContext(hCryptProv\, 0); } }

http​://msdn.microsoft.com/en-us/library/windows/desktop/aa379942%28v=vs.85%29.aspx

Just curious but wouldnt something like that be pretty slow?

I would think that ideally the Win32 version of Perl should "out of the box" have a reasonably fast RNG.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @kmx

On 27.11.2012 20​:52\, demerphq wrote​:

On 27 November 2012 20​:43\, kmx \kmx@&#8203;atlas\.cz wrote​:

On 27.11.2012 16​:40\, Christian Walde via RT wrote​:

On Tue Nov 27 07​:19​:53 2012\, LeonT wrote​:

If there's something obviously better available we could use that. On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

http​://msdn.microsoft.com/en-us/library/aa387694(v=vs.80).aspx http​://blogs.msdn.com/b/michael_howard/archive/2005/01/14/353379.aspx

What about CryptGenRandom (available on WinXP+)

{ BYTE randomdata[256]; HCRYPTPROV hCryptProv = 0; if (CryptAcquireContextW(&hCryptProv\, NULL\, NULL\, PROV_RSA_FULL\, CRYPT_VERIFYCONTEXT)) { if (CryptGenRandom(hCryptProv\, sizeof(randomdata)\, randomdata)) { /* success */ } else { /* failure */ } CryptReleaseContext(hCryptProv\, 0); } }

http​://msdn.microsoft.com/en-us/library/windows/desktop/aa379942%28v=vs.85%29.aspx

Just curious but wouldnt something like that be pretty slow?

I would think that ideally the Win32 version of Perl should "out of the box" have a reasonably fast RNG.

[resending as I forget to Cc​: the list]

CryptGenRandom itself is IMO pretty fast the trouble might be with the time spent in CryptAcquireContextW/CryptReleaseContext.

-- kmx

p5pRT commented 11 years ago

From @bulk88

On Tue Nov 27 08​:11​:01 2012\, Mithaldu wrote​:

On Tue Nov 27 07​:53​:01 2012\, SineSwiper wrote​:

On Tue Nov 27 07​:49​:28 2012\, LeonT wrote​:

Looks workable\, though it's linking is horrible even by Windows standards.

BTW\, what does Perl currently use for random functionality on Windows?
Honestly\, I'm surprised that Windows library routines have that few combinations.

Windows currently supplies a rand() in the stdlib that uses 15 bits and is only meant as a sort of placeholder\, since you are meant to select your own random generator if you actually care about it. Looks like Perl simply uses that straight-up.

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

This is the implementation of rand in MS's CRT http​://code.google.com/searchframe#S3vzerue4i0/trunk/reactos/lib/sdk/crt/math/rand.c&type=cs&l=10 . ntdll's RtlRandom and RtlRandomEx are similar to CRT's rand. They make no further calls. Going through RtlGenRandom or anything else in advapi will be atleast a dozen if not a 100 times slower (or more\, I didn't benchmark it\, I know it is 100s of calls tho and eventually an ioctl/DeviceIoControl to ksecdd.sys driver for the randomness) by my educated guess. A faster idea is to use QueryPerformanceCounter each time\, or save a QueryPerformanceCounter as a seed. MS itself uses tick counts (GetTickCount makes no calls\, just polls a global/static updated by the kernel on every context switch) or QueryPerformanceCounter as seeds for RtlRandom and RtlRandomEx. Also see http​://code.google.com/searchframe#S3vzerue4i0/trunk/reactos/lib/sdk/crt/startup/gs_support.c&q=__security_init_cookie%20package​:reactos-mirror\.googlecode\.com&l=50

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From @wchristian

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm?

p5pRT commented 11 years ago

From @demerphq

On 28 November 2012 14​:43\, Christian Walde via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm?

I am just curious why would that be preferable to just coding your own longer period LCG?

Afaik they amount to a handful of lines of C.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

p5pRT commented 11 years ago

From @wchristian

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

Alright\, i talked this through on #p5p and here's the summary as i understood it​:

First off\, the offending property of the windows RNG is granularity in this case. In this case the granularity is a maximum of 15 bits\, which provides a number of unique results that is far below even a modest sample size.

Further\, in Perl the case is that the random bits gotten from the system are stuffed into a double float\, bounded as [0\,1[. This further reduces the actual number of bits used from the RNG\, as such a float can only contain 53 bits.

On linux\, drand48 is used\, which provides 48 bits of entropy.

demerphq suggested that it might be possible to select an RNG that could be implemented for Perl in C\, and then be used for both Linux and Windows. Since the granularity of the current linux generator is below the granularity that a double can provide\, this seems to be a viable path. However i think this is something that should be carefully considered by people with the appropiate amount of knowledge and experience\, which will take some time and thus might unnecessarily delay a resolution for windows. As such i think this should definitely be considered in a separate ticket.

To the issue at hand\, granularity on windows\, a solution that seems useful and provides appropiate granularity would be this​: https​:// gist.github.com/956faeafb71b2ac735aa

I am unsure if it could be extended to provide 53 bits of entropy in a double even on 32 bit systems and how it would be patched into Perl as it is. bulk88\, i've been told by demerphq that you have the chops and the platform for this. Could you please look into this?

p5pRT commented 11 years ago

From @craigberry

On Wed\, Nov 28\, 2012 at 7​:51 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 28 November 2012 14​:43\, Christian Walde via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm?

I am just curious why would that be preferable to just coding your own longer period LCG?

Afaik they amount to a handful of lines of C.

The FreeBSD implementation of drand48() is indeed short and sweet even if you follow the links a couple of levels deep​:

\<http​://fxr.watson.org/fxr/source/gen/drand48.c?v=FREEBSD-LIBC;im=bigexcerpts>

It might just work verbatim on Windows.

p5pRT commented 11 years ago

From @kmx

On 28.11.2012 17​:11\, Craig A. Berry wrote​:

On Wed\, Nov 28\, 2012 at 7​:51 AM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 28 November 2012 14​:43\, Christian Walde via RT \perlbug\-followup@&#8203;perl\.org wrote​:

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2? That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm? I am just curious why would that be preferable to just coding your own longer period LCG?

Afaik they amount to a handful of lines of C. The FreeBSD implementation of drand48() is indeed short and sweet even if you follow the links a couple of levels deep​:

\<http​://fxr.watson.org/fxr/source/gen/drand48.c?v=FREEBSD-LIBC;im=bigexcerpts>

It might just work verbatim on Windows.

Another candidate (long period\, good speed\, good randomness) that can be quite easy to implement is MT19937 https://en.wikipedia.org/wiki/Mersenne_twister

The actual implementation (approx. 100 lines of code) can look like this​: https://metacpan.org/source/FANGLY/Math-Random-MT-1.16/_mt.c

-- kmx

p5pRT commented 11 years ago

From @bulk88

On Wed Nov 28 05​:43​:55 2012\, Mithaldu wrote​:

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm?

I am going to try to fix this by spreading the bits of whatever rand a perl is using across the integer bits of a NV (usually 53 bits for a 64 bit double). I plan to test it as XSUBs first and I will post those here. -- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From @paulg1973

FWIW\, the C-1990 and C-1999 standards define the rand() function as follows (relevant language extracted)​:

"The rand function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX. ... The value of the RAND_MAX shall be at least 32767."

The Stratus OpenVOS implementation of rand() also limits itself to a RAND_MAX of 32767; I suspect this is due to a desire to maintain compatibility with legacy C programs. Once these limits are defined and widely used\, it is impossible to change them without breaking working code.

PG

p5pRT commented 11 years ago

From @hvds

"Christian Walde via RT" \perlbug\-followup@&#8203;perl\.org wrote​: :To the issue at hand\, granularity on windows\, a solution that seems useful :and provides appropiate granularity would be this​: https​:// :gist.github.com/956faeafb71b2ac735aa

If the RNG has a period of 2^15\, I believe this solution will have the same period. (If the RNG had a nearby period that happened to be a multiple of 3\, I believe this solution would have a period only a third as long.)

Hugo

p5pRT commented 11 years ago

From @tonycoz

On Wed\, Nov 28\, 2012 at 12​:54​:36PM -0500\, Green\, Paul wrote​:

FWIW\, the C-1990 and C-1999 standards define the rand() function as follows (relevant language extracted)​:

"The rand function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX. ... The value of the RAND_MAX shall be at least 32767."

The Stratus OpenVOS implementation of rand() also limits itself to a RAND_MAX of 32767; I suspect this is due to a desire to maintain compatibility with legacy C programs. Once these limits are defined and widely used\, it is impossible to change them without breaking working code.

That doesn't prevent the implementation storing more than 15 bits of entropy.

It could simply return the lower 15 bits of the result from a 32 or larger LCRNG implementation\, as the simplest example.

Tony

p5pRT commented 11 years ago

From BitCard@ResonatorSoft.org

On Wed Nov 28 09​:55​:18 2012\, paulg wrote​:

FWIW\, the C-1990 and C-1999 standards define the rand() function as follows (relevant language extracted)​:

"The rand function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX. ... The value of the RAND_MAX shall be at least 32767." The Stratus OpenVOS implementation of rand() also limits itself to a RAND_MAX of 32767; I suspect this is due to a desire to maintain compatibility with legacy C programs. Once these limits are defined and widely used\, it is impossible to change them without breaking working code.

Perl isn't C. We document that rand returns "a random fractional number greater than or equal to 0 and less than the value of EXPR". While bits of entropy isn't documented here\, I think we can all agree that 15 bits is too low.

I like bulk77's solution. It's fast and portably achieves the maximum entropy based on IV length. Any crypto-security issues or special algorithms can be covered in other modules.

p5pRT commented 11 years ago

From @ap

* bulk88 via RT \perlbug\-followup@&#8203;perl\.org [2012-11-28 09​:20]​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That doesn’t increase the entropy available\, it just draws from the pool faster. I don’t have my randomness down well enough to reason it through with the necessary rigour but I’m pretty sure that at best that approach does nothing to increase the quality of the random numbers and likely it will decrease it.

Don’t.

Randomness is a treacherous sea – easy to get it wrong\, hard to notice when you do\, and with far-reaching consequences. (Debian patched some random number generator and caused half the world to have to mint new SSH keys a few years down the line.)

Find a syscall that draws from a deeper pool or put in some other known algorithm with a greater amount of hidden state or really just about any other approach except this one.

Or just leave it as it is. If it’s bad\, just let it look bad. Being bad is not a problem (absent promises about quality – and perl makes none) but looking better than it is would be.

The documentation already covers everything it needs to\, so nothing to be done there either.

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From @ap

* Aristotle Pagaltzis \pagaltzis@&#8203;gmx\.de [2012-11-29 03​:01]​:

Randomness is a treacherous sea – easy to get it wrong\, hard to notice when you do\, and with far-reaching consequences. (Debian patched some random number generator and caused half the world to have to mint new SSH keys a few years down the line.)

And just as I send this mail\, I notice that last week’s LWN has come out from under the paywall​:

  LCE​: Don’t play dice with random numbers   https://lwn.net/Articles/525459/

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From @doy

On Thu\, Nov 29\, 2012 at 03​:01​:00AM +0100\, Aristotle Pagaltzis wrote​:

* bulk88 via RT \perlbug\-followup@&#8203;perl\.org [2012-11-28 09​:20]​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That doesn’t increase the entropy available\, it just draws from the pool faster. I don’t have my randomness down well enough to reason it through with the necessary rigour but I’m pretty sure that at best that approach does nothing to increase the quality of the random numbers and likely it will decrease it.

Don’t.

Randomness is a treacherous sea – easy to get it wrong\, hard to notice when you do\, and with far-reaching consequences. (Debian patched some random number generator and caused half the world to have to mint new SSH keys a few years down the line.)

Find a syscall that draws from a deeper pool or put in some other known algorithm with a greater amount of hidden state or really just about any other approach except this one.

Or just leave it as it is. If it’s bad\, just let it look bad. Being bad is not a problem (absent promises about quality – and perl makes none) but looking better than it is would be.

The documentation already covers everything it needs to\, so nothing to be done there either.

As was pointed out on IRC\, there are actually two orthogonal concerns here - how long until the sequence of random numbers repeats\, and how many distinct numbers are able to be generated. The proposed solution here will increase the second while potentially (although not necessarily) decreasing the first. This isn't necessarily a bad tradeoff\, depending on what the period actually is (it could easily be far greater than the number of distinct numbers that can be generated\, if the generator keeps hidden internal state). There's nothing inherently problematic about this idea\, as long as you're aware that it's only going to increase the potential range of numbers generated\, not increase the randomness itself.

That said\, I do tend to agree with Yves here - if we're actually concerned about the quality of the random numbers that are avaialble\, good rngs are pretty trivial to implement (or steal)\, so we may as well just do that if want to go down this road.

-doy

p5pRT commented 11 years ago

From @bulk88

On Wed Nov 28 09​:41​:59 2012\, bulk88 wrote​:

On Wed Nov 28 05​:43​:55 2012\, Mithaldu wrote​:

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2?

That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm?

I am going to try to fix this by spreading the bits of whatever rand a perl is using across the integer bits of a NV (usually 53 bits for a 64 bit double). I plan to test it as XSUBs first and I will post those here.

Windows only right now. ______________________________________________________--

NV RunRand(...) PREINIT​:   LARGE_INTEGER my_beg;   LARGE_INTEGER my_end;   LARGE_INTEGER my_freq;   NV value;   int io; CODE​:   if (!PL_srand_called) {   (void)seedDrand01((Rand_seed_t)seed());   PL_srand_called = TRUE;   }   RETVAL = 0.0;   QueryPerformanceFrequency(&my_freq);  
  QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++)   {   int i;   NV randnv = 0.0;   const int nvofbits = NV_OVERFLOW_BITS;   const NV overflowat = NV_OVERFLOWS_INTEGERS_AT;   NV mulby;   value = 1.0;   for(i=0; i \< nvofbits; i += RANDBITS){   NV randraw = rand();   if (nvofbits - i \< RANDBITS) {   /* In the last loop iteration\, we have   to cut off the high bits\, so we don't overflow the mantissa   */   /* scalb is slower than pow */   NV mask = Perl_pow(2\, nvofbits - i > RANDBITS ? RANDBITS : nvofbits - i) - 1;   randraw = Perl_fmod(randraw\, mask); /* & the bits */  
  }   randraw *= Perl_pow(2\, i); /* \<\< the bits */   randnv += randraw; /* | the bits */   }   mulby = (randnv / overflowat); /* >> so randnv is between 0.0 and 1.0 */   value *= mulby;   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("new time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart);  
  QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++){   value = 1.0;   value *= Drand01();   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("old time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart); OUTPUT​:   RETVAL _____________________________________________________

new time=6.786747\, opt= -O1 -G7 -GL -Oi -Og old time=0.167446\, opt= -O1 -G7 -GL -Oi -Og _____________________________________________________ That is 42 times slower than the previous implementation. I already moved the bit masking into a conditional\, that shaved off 4 seconds. scalb made things .2 to .5 second worse. A hand written pow loop in a function took 8 seconds instead of the 6. A header file is included that RunRand requires. nvoverflowbits.h was created using a script\, but Visual C had a "parser stack overflow" (C1026) when upto 256 bits where in the conditional tree. I guess I have to somehow cache the Perl_pow results in global statics and calculate them once per process startup. The whole loop can be unrolled actually\, but it wouldn't be portable if it is unrolled by hand.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From @bulk88

nvoverflowbits.h

p5pRT commented 11 years ago

From @bulk88

On Wed Nov 28 23​:13​:32 2012\, bulk88 wrote​:

I guess I have to somehow cache the Perl_pow results in global statics and calculate them once per process startup. The whole loop can be unrolled actually\, but it wouldn't be portable if it is unrolled by hand.

On the collisions test posted by Brendan Byrd\, the new rand returned 0.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From @kmx

On 29.11.2012 8​:13\, bulk88 via RT wrote​:

On Wed Nov 28 09​:41​:59 2012\, bulk88 wrote​:

On Wed Nov 28 05​:43​:55 2012\, Mithaldu wrote​:

On Wed Nov 28 00​:15​:00 2012\, bulk88 wrote​:

Why not call the CRT's rand twice or thrice\, once for low 15 bits\, 2nd for high 15 bits\, and 3rd for remaining 2? That sounds like a very reasonable solution. Can you maybe implement a very small c program to generate a million numbers like that and dump them to STDOUT to check how many collisions result from that algorithm? I am going to try to fix this by spreading the bits of whatever rand a perl is using across the integer bits of a NV (usually 53 bits for a 64 bit double). I plan to test it as XSUBs first and I will post those here. Windows only right now. ...

bulk88\, could you please check in your benchmark how slow will be the following straightforward implementation?

double drand53() {   unsigned int a = rand() & 0x1FFF; /* 13 bits */   unsigned int b = rand() & 0x1FFF; /* 13 bits */   unsigned int c = rand() & 0x1FFF; /* 13 bits */   unsigned int d = rand() & 0x3FFF; /* 14 bits */   /* (a * (2^40) + b * (2^27) + c * (2^14) + d) / (2^53) */   return (a*1099511627776.0+b*134217728.0+c*16384.0+d)/9007199254740992.0; }

-- kmx

p5pRT commented 11 years ago

From perl@profvince.com

As was pointed out on IRC\, there are actually two orthogonal concerns here - how long until the sequence of random numbers repeats\, and how many distinct numbers are able to be generated. The proposed solution here will increase the second while potentially (although not necessarily) decreasing the first.

If the period of the RNG is not coprime with the number of calls needed to make a 32 bits random integer with that algorithm\, then not only you effectively divide the period of the RNG by that number\, but the resulting distribution is also not uniform. And of course you have no reliable way to know what the period is\, even though it's likely to be a power of 2.

My opinion on this is : just leave it as is. The current implementation is enough for basic uses\, and implementing a better RNG for rand() is not useful for people that want random numbers with good statistical properties because that would be yet another black box (like nobody with serious accuracy concerns will ever use the complex type from C99).

Vincent

p5pRT commented 11 years ago

From @wchristian

On Thu Nov 29 03​:59​:27 2012\, perl@​profvince.com wrote​:

My opinion on this is : just leave it as is. The current implementation is enough for basic uses

I strongly disagree with this statement and would like you to explain exactly what you consider to be "basic uses".

Just to be clear about this​: Anytime a script tries to select items randomly from the entirety of a collection of 32769 items\, it will not work\, because only 32768 of those can be accessed with rand(). Worse\, this failure will be silent.

p5pRT commented 11 years ago

From @kmx

On 29.11.2012 12​:58\, Vincent Pit wrote​:

As was pointed out on IRC\, there are actually two orthogonal concerns here - how long until the sequence of random numbers repeats\, and how many distinct numbers are able to be generated. The proposed solution here will increase the second while potentially (although not necessarily) decreasing the first.

If the period of the RNG is not coprime with the number of calls needed to make a 32 bits random integer with that algorithm\, then not only you effectively divide the period of the RNG by that number\, but the resulting distribution is also not uniform. And of course you have no reliable way to know what the period is\, even though it's likely to be a power of 2.

My opinion on this is : just leave it as is. The current implementation is enough for basic uses\, and implementing a better RNG for rand() is not useful for people that want random numbers with good statistical properties because that would be yet another black box (like nobody with serious accuracy concerns will ever use the complex type from C99).

You are probably right if talking about perl's rand function as rand() on MS Windows is perhaps so poor that creating ugly workarounds is not worth the effort (especially when implementing or "stealing" some of the well established PRNG is IMO comparable amount of work).

The other thing (perhaps not related to this RT) is the fact that the same 15bit-low-entropy-short-period-not-so-excelent rand() is on MS Windows used also in Perl_get_hash_seed(..)

-- kmx

p5pRT commented 11 years ago

From @bulk88

On Thu Nov 29 03​:04​:29 2012\, kmx@​atlas.cz wrote​:

bulk88\, could you please check in your benchmark how slow will be the following straightforward implementation?

double drand53() { unsigned int a = rand() & 0x1FFF; /* 13 bits */ unsigned int b = rand() & 0x1FFF; /* 13 bits */ unsigned int c = rand() & 0x1FFF; /* 13 bits */ unsigned int d = rand() & 0x3FFF; /* 14 bits */ /* (a * (2^40) + b * (2^27) + c * (2^14) + d) / (2^53) */ return (a*1099511627776.0+b*134217728.0+c*16384.0+d)/9007199254740992.0; }

-- kmx

VC 2003 32 bit x86 ___________________________________________________________________________ NV RunRand(...) PREINIT​:   LARGE_INTEGER my_beg;   LARGE_INTEGER my_end;   LARGE_INTEGER my_freq;   NV value;   int io; CODE​:   if (!PL_srand_called) {   (void)seedDrand01((Rand_seed_t)seed());   PL_srand_called = TRUE;   }   RETVAL = 0.0;   QueryPerformanceFrequency(&my_freq);  
  QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++)   {   int i;   NV randnv = 0.0;   const int nvofbits = NV_OVERFLOW_BITS;   const NV overflowat = NV_OVERFLOWS_INTEGERS_AT;   NV mulby;   value = 1.0;   for(i=0; i \< nvofbits; i += RANDBITS){   NV randraw = rand();   if (nvofbits - i \< RANDBITS) {   /* In the last loop iteration\, we have   to cut off the high bits\, so we don't overflow the mantissa   */   /* scalb is slower than pow */   NV mask = Perl_pow(2\, nvofbits - i > RANDBITS ? RANDBITS : nvofbits - i) - 1;   randraw = Perl_fmod(randraw\, mask); /* & the bits */  
  }   randraw *= Perl_pow(2\, i); /* \<\< the bits */   randnv += randraw; /* | the bits */   }   mulby = (randnv / overflowat); /* >> so randnv is between 0.0 and 1.0 */   value *= mulby;   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("new time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart);  
  QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++){   value = 1.0;   value *= Drand01();   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("old time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart);

  QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++){
  unsigned int a = rand() & 0x1FFF; /* 13 bits */   unsigned int b = rand() & 0x1FFF; /* 13 bits */   unsigned int c = rand() & 0x1FFF; /* 13 bits */   unsigned int d = rand() & 0x3FFF; /* 14 bits */   /* (a * (2^40) + b * (2^27) + c * (2^14) + d) / (2^53) */   value = 1.0;   value *= (a*1099511627776.0+b*134217728.0+c*16384.0+d)/9007199254740992.0;   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("new kmx time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart);   QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++){
  /* (a * (2^40) + b * (2^27) + c * (2^14) + d) / (2^53) */   value = 1.0;   value *= ((rand() & 0x1FFF)*1099511627776.0   +(rand() & 0x1FFF)*134217728.0   +(rand() & 0x1FFF)*16384.0   +(rand() & 0x3FFF))/9007199254740992.0;   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("new kmx bulk88 1 time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart);   QueryPerformanceCounter(&my_beg);   for(io=0; io \< 10000000; io++){
  unsigned short a = rand() & 0x1FFF; /* 13 bits */   unsigned short b = rand() & 0x1FFF; /* 13 bits */   unsigned short c = rand() & 0x1FFF; /* 13 bits */   unsigned short d = rand() & 0x3FFF; /* 14 bits */   /* (a * (2^40) + b * (2^27) + c * (2^14) + d) / (2^53) */   value = 1.0;   value *= (a*1099511627776.0+b*134217728.0+c*16384.0+d)/9007199254740992.0;   RETVAL += value;   }   QueryPerformanceCounter(&my_end);   printf("new kmx bulk88 2 time=%f\, opt=" OPTIMIZE "\n"\, ((double)(my_end.QuadPart-my_beg.QuadPart))/(double)my_freq.QuadPart); OUTPUT​:   RETVAL ____________________________________________________________________ new time=6.196008\, opt= -O1 -G7 -GL -Oi -Og -arch​:SSE2 old time=0.159792\, opt= -O1 -G7 -GL -Oi -Og -arch​:SSE2 new kmx time=0.668545\, opt= -O1 -G7 -GL -Oi -Og -arch​:SSE2 new kmx bulk88 1 time=0.638748\, opt= -O1 -G7 -GL -Oi -Og -arch​:SSE2 new kmx bulk88 2 time=0.846463\, opt= -O1 -G7 -GL -Oi -Og -arch​:SSE2 ____________________________________________________________________ new time=6.336704\, opt= -O1 -G7 -GL -Oi -Og old time=0.159920\, opt= -O1 -G7 -GL -Oi -Og new kmx time=0.665955\, opt= -O1 -G7 -GL -Oi -Og new kmx bulk88 1 time=0.680728\, opt= -O1 -G7 -GL -Oi -Og new kmx bulk88 2 time=0.820330\, opt= -O1 -G7 -GL -Oi -Og ____________________________________________________________________

Explicit shorts were slower due to extra movzx instructions. The other 2 kmx/kmx derived implementations I didn't investigate. The kmx implementation is windows specific. There is one more implementation that is windows specific that I think will be faster. I will post it later today or tomorrow. -- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From Perl@ResonatorSoft.org

On Tue\, Nov 27\, 2012 at 10​:19 AM\, Leon Timmermans via RT \perlbug\-followup@&#8203;perl\.org wrote​:

Perl already chooses between a bunch of random number sources. If there's something obviously better available we could use that.

That said\, I don't really think this is really severity=medium. rand() doesn't make any promises on strength\, and if you need something stronger there are modules on CPAN that can provide just that.

True\, but the entropy seems to be abysmally low here\, especially since the -V settings indicate that the float bytes aren't the bottleneck here. I would at least expect 24 to 32 bits of entropy. Not looking for crypto security here.

I don't care too much about the severity\, but only having 32K possible random numbers still strikes me as a bug\, not a "feature".

-- Brendan Byrd \Perl@&#8203;ResonatorSoft\.org Brendan Byrd \BBYRD@&#8203;CPAN\.org

p5pRT commented 11 years ago

From @bulk88

On Thu Nov 29 11​:14​:15 2012\, bulk88 wrote​:

Explicit shorts were slower due to extra movzx instructions. The other 2 kmx/kmx derived implementations I didn't investigate. The kmx implementation is windows specific. There is one more implementation that is windows specific that I think will be faster. I will post it later today or tomorrow.

"bulk88 3" algorithm is the fastest and least portable way\, it also will also create the random double without ever touching the FPU (probably still need to add *1.0 multiply bypass) and without a final division to bring it to 0.0 thru 1.0. "just rand" is close (but not exactly) to the least possible operations\, just calling rand 4 times and using the returned val for something trivial. I tried using Benchmark cmpthese but the results were 80% +/- random each time\, if the same xsub ref is passed multiple times to cmpthese\, there is upto 50% difference between runs of the same xsub with different lavels in the same invocation of cmpthese\, so I am ignoring those numbers from Benchmark. Attaching new code since its getting too large to paste in. bulk88 1 and bulk88 2 algos I never checked if they actually distribute the bits correctly. bulk88 3 I checked by watching hex digits but I didn't analyze it on a per bit level\, line to line (I'll do that eventually).

____________________________________________________________________ old time=0.344397\, opt= -O1 -G7 -GL -Oi -Og 9998993.931366 new kmx time=1.345092\, opt= -O1 -G7 -GL -Oi -Og 10000433.102156 new kmx bulk88 1 time=1.343367\, opt= -O1 -G7 -GL -Oi -Og 9999833.635918 new kmx bulk88 2 time=1.644626\, opt= -O1 -G7 -GL -Oi -Og 9999610.299361 new kmx bulk88 3 time=1.257518\, opt= -O1 -G7 -GL -Oi -Og 2812480.285862 just rand time=1.140518\, opt= -O1 -G7 -GL -Oi -Og 1310798772909.000000 ____________________________________________________________________ -- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From @bulk88

11-29-12rand.xs

p5pRT commented 11 years ago

From @kmx

On 30.11.2012 2​:44\, bulk88 via RT wrote​:

... I never checked if they actually distribute the bits correctly. bulk88 3 I checked by watching hex digits but I didn't analyze it on a per bit level\, line to line (I'll do that eventually).

Throwing random 8 bytes over raw double representation does not work as you expect.

I have tested your code and resulting double values are from range​: 0.06-0.24 (not even uniformly distributed)

If you want to construct really random double from random bytes without involving FPU have a look for example at function int_pair_to_real_inclusive() in http​://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/random.c?view=markup

(I apologise in advance if it is too offensive to link ruby source code in p5p list)

-- kmx

p5pRT commented 11 years ago

From @ap

* kmx \kmx@&#8203;atlas\.cz [2012-11-30 12​:25]​:

(I apologise in advance if it is too offensive to link ruby source code in p5p list)

Why?

How else do you steal?

p5pRT commented 11 years ago

From @craigberry

On Tue\, Nov 27\, 2012 at 1​:52 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 27 November 2012 20​:43\, kmx \kmx@&#8203;atlas\.cz wrote​:

On 27.11.2012 16​:40\, Christian Walde via RT wrote​:

On Tue Nov 27 07​:19​:53 2012\, LeonT wrote​:

On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

And the CRT provides this as rand_s(). See \<http​://msdn.microsoft.com/en-us/library/sxtz2fa8%28v=vs.110%29.aspx>. I've attached a proof-of-concept patch using rand_s() that out of laziness just pastes​:

static double win32_drand01(void) {   unsigned int val;   (void)rand_s(&val);   return (double)val; }

into perl.h. Obviously that would have to be moved to win32/win32.c and made visible\, but this was enough to do a quick build and see what it gets us. The obvious advantage is that it's almost no code to maintain and immediately provides 32 bits of entropy instead of 15. The obvious disadvantage is ...

Just curious but wouldnt something like that be pretty slow?

Yes\, rand_s() takes about 6 times as long as rand()\, using this as a benchmark​:

C​:\perlgit>type randbench.pl use Benchmark;

my $cnt = 0; my $t0 = new Benchmark; my $num = 0; foreach my $i (1 .. 10_000_000) {   $num = rand;   $cnt++; } my $t1 = new Benchmark; my $td = timediff($t1\, $t0); print "$cnt iterations in "\, timestr($td)\, "\n"; print "The last random # generated was $num\n";

I would think that ideally the Win32 version of Perl should "out of the box" have a reasonably fast RNG.

I agree\, but more randomness is likely to cost *something* in run time and/or maintenance time.

p5pRT commented 11 years ago

From @craigberry

use_rand_s.patch ```diff diff --git a/perl.h b/perl.h index b13521a..a0fbcb0 100644 --- a/perl.h +++ b/perl.h @@ -5889,7 +5889,11 @@ extern void moncontrol(int); (KEEP THIS LAST IN perl.h!) */ - +static double win32_drand01(void) { + unsigned int val; + (void)rand_s(&val); + return (double)val; +} #endif /* Include guard */ /* diff --git a/win32/Makefile b/win32/Makefile index 2c3d52d..9695b79 100644 --- a/win32/Makefile +++ b/win32/Makefile @@ -485,7 +485,7 @@ LIBBASEFILES = $(LIBBASEFILES) bufferoverflowU.lib LIBFILES = $(LIBBASEFILES) $(LIBC) #EXTRACFLAGS = -nologo -GF -W4 -wd4127 -wd4706 -EXTRACFLAGS = -nologo -GF -W3 +EXTRACFLAGS = -nologo -GF -W3 -D_CRT_RAND_S CFLAGS = $(EXTRACFLAGS) $(INCLUDES) $(DEFINES) $(LOCDEFS) \ $(PCHFLAGS) $(OPTIMIZE) LINK_FLAGS = -nologo -nodefaultlib $(LINK_DBG) \ diff --git a/win32/config.vc b/win32/config.vc index 0e975fd..d478104 100644 --- a/win32/config.vc +++ b/win32/config.vc @@ -543,7 +543,7 @@ direntrytype='struct direct' dlext='dll' dlsrc='dl_win32.xs' doublesize='8' -drand01='(rand()/(double)((unsigned)1<
p5pRT commented 11 years ago

From @nwc10

On Fri\, Nov 30\, 2012 at 10​:17​:24AM -0600\, Craig A. Berry wrote​:

On Tue\, Nov 27\, 2012 at 1​:52 PM\, demerphq \demerphq@&#8203;gmail\.com wrote​:

On 27 November 2012 20​:43\, kmx \kmx@&#8203;atlas\.cz wrote​:

On 27.11.2012 16​:40\, Christian Walde via RT wrote​:

On Tue Nov 27 07​:19​:53 2012\, LeonT wrote​:

On Windows XP and higher\, there is RtlGenRandom\, would that fit the bill?

And the CRT provides this as rand_s(). See

Which seems great. But

\<http​://msdn.microsoft.com/en-us/library/sxtz2fa8%28v=vs.110%29.aspx>.

  The rand_s function writes a pseudorandom integer in the range 0 to   UINT_MAX to the input pointer. The rand_s function uses the operating   system to generate cryptographically secure random numbers. It does not   use the seed generated by the srand function\, nor does it affect the   random number sequence used by rand.

which means that this would bust the documented behaviour for srand​:

  Sets and returns the random number seed for the C\ operator.

So\, what are we trying to achieve? And what guarantees do we want to keep?

I would think that ideally the Win32 version of Perl should "out of the box" have a reasonably fast RNG.

I agree\, but more randomness is likely to cost *something* in run time and/or maintenance time.

I agree too.

I think that the answer to the first question is fairly unequivocably "better random numbers from rand() on Win32"\, and\, *specifically*\, more than 15 bits.

I personally think that it's very useful to maintain the relationship between srand() and rand() - ie that rand() is a *pseudo*-random number generator\, and that it's possible to fix and repeat the sequence by using srand(). (I've used this in the past)

If that is true (and we don't want to re-write perlfunc.pod to remove long standing functionality from srand()) then sadly that Win32 CRT function is *not* useful.

I'm also very very wary of trying to roll *anything* ourselves. There will be plenty of ways to really screw this up down the line. So\, I think the way to go is either

1) for Win32\, adopt an existing BSD-licensed PRG which provides 32\, if not   48 bits of result (and hopefully a lot more entropy internally)   What is in OpenBSD's libc?

2) for everything\, switch to Mersenne Twister. If it's the default for PHP\,   Python and Ruby\, surely it's good enough for Perl?

Nicholas Clark

p5pRT commented 11 years ago

From @wchristian

On Fri Nov 30 08​:18​:19 2012\, craig.a.berry@​gmail.com wrote​:

Yes\, rand_s() takes about 6 times as long as rand()\, using this as a benchmark​:

I don't think that's the correct speed difference to test. Instead rand should be compared between windows and linux on similar hardware. If a better perl rand() on windows is 6 times slower than the current rand()\, but the current rand() on windows is 12 times faster than the linux rand ()\, then the difference does not really matter. :)

p5pRT commented 11 years ago

From @craigberry

On Fri\, Nov 30\, 2012 at 10​:34 AM\, Nicholas Clark \nick@&#8203;ccl4\.org wrote​:

I personally think that it's very useful to maintain the relationship between srand() and rand() - ie that rand() is a *pseudo*-random number generator\, and that it's possible to fix and repeat the sequence by using srand(). (I've used this in the past)

Microsoft has other APIs where the input value is used as part of the seed​:

\<http​://msdn.microsoft.com/en-us/library/windows/desktop/aa379942(v=vs.85).aspx>

It's possible rand_s() does that too\, though it's not documented.

I'm also very very wary of trying to roll *anything* ourselves. There will be plenty of ways to really screw this up down the line.

Yes. I think this thread has been highly information on how not to implement an RNG :-).

So\, I think the way to go is either

1) for Win32\, adopt an existing BSD-licensed PRG which provides 32\, if not 48 bits of result (and hopefully a lot more entropy internally) What is in OpenBSD's libc?

Earlier in the thread\, I said​:

The FreeBSD implementation of drand48() is indeed short and sweet even if you follow the links a couple of levels deep​:

\<http​://fxr.watson.org/fxr/source/gen/drand48.c?v=FREEBSD-LIBC;im=bigexcerpts>

It might just work verbatim on Windows.

2) for everything\, switch to Mersenne Twister. If it's the default for PHP\, Python and Ruby\, surely it's good enough for Perl?

Perhaps TinyMT (\<http​://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html>) would be adequate for a basic (and portable) random number generator.

p5pRT commented 11 years ago

From @bulk88

On Fri Nov 30 03​:22​:59 2012\, kmx@​atlas.cz wrote​:

Throwing random 8 bytes over raw double representation does not work as you expect.

I have tested your code and resulting double values are from range​: 0.06-0.24 (not even uniformly distributed)

If you want to construct really random double from random bytes without involving FPU have a look for example at function int_pair_to_real_inclusive() in http​://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/random.c?view=markup

(I apologise in advance if it is too offensive to link ruby source code in p5p list)

-- kmx

I had a feeling the first 2 digits were not as random as later digits prior to you identifying. You proved me correct. The algorithm was supposed to generate a number between .9999 and ~.0125 but not smaller than .0125000000\, but a bug probably narrowed that range. There was a design issue whether to make the exponent random\, but then int(rand(100)) nearly always returns 0 rather than an equal distribution between 0 to 100. I created a new version of it which uses FP subtraction to cut off 1st unrandom digit\, and writing up some digit distribution tests\, the old 15 bit rand has a very visible difference with my tests. I will post it after some more testing and benchmarking on my end.

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 11 years ago

From BitCard@ResonatorSoft.org

On Wed Nov 28 18​:01​:44 2012\, aristotle wrote​:

Don’t.

Randomness is a treacherous sea – easy to get it wrong\, hard to notice when you do\, and with far-reaching consequences. (Debian patched some random number generator and caused half the world to have to mint new SSH keys a few years down the line.)

Math hard\, so don't bother? Sorry\, but I strongly disagree. A word of caution is perfectly valid\, as RNGs do need some careful thought for implementation. But\, it can be done.

Also\, it's already broken right now.

That doesn’t increase the entropy available\, it just draws from the pool faster. I don’t have my randomness down well enough to reason it through with the necessary rigour but I’m pretty sure that at best that approach does nothing to increase the quality of the random numbers and likely it will decrease it.

Eh? What does it matter if it draws from the pool faster? We're not talking about seed prediction (or crypto-level RNGs).

And we're talking about bitshifting on a binary boundary. (In other words\, the range is from 0 to 15-bits full.) As long as that is true\, appending the RNG results is perfectly valid\, giving you a completely random number that uses all bits in all places.

--

I am glad that this bug is getting quite a bit of attention. Keep up the hard work!

p5pRT commented 11 years ago

From @ap

* Brendan Byrd via RT \perlbug\-followup@&#8203;perl\.org [2012-12-01 05​:50]​:

On Wed Nov 28 18​:01​:44 2012\, aristotle wrote​:

Don’t.

Randomness is a treacherous sea – easy to get it wrong\, hard to notice when you do\, and with far-reaching consequences. (Debian patched some random number generator and caused half the world to have to mint new SSH keys a few years down the line.)

Math hard\, so don't bother? Sorry\, but I strongly disagree. A word of caution is perfectly valid\, as RNGs do need some careful thought for implementation. But\, it can be done.

Where did you see me saying that? What I said is\, this math harder than you think\, so don’t just follow the first intuition that pops into your head​: if you want to do something about it\, approach the topic with the necessary humility. Of course it can be done\, just not in 15 minutes.

(And you can always copy the work of someone who has already sweat the details. I won’t raise any concerns over that. I would like it in fact.)

If you just meant the “Don’t”\, it was only about one specific approach\, per preceding context.

Also\, it's already broken right now.

Yes\, so don’t smear lipstick on it – fix it or leave it.

That doesn’t increase the entropy available\, it just draws from the pool faster. I don’t have my randomness down well enough to reason it through with the necessary rigour but I’m pretty sure that at best that approach does nothing to increase the quality of the random numbers and likely it will decrease it.

Eh? What does it matter if it draws from the pool faster? We're not talking about seed prediction (or crypto-level RNGs).

And we're talking about bitshifting on a binary boundary. (In other words\, the range is from 0 to 15-bits full.) As long as that is true\, appending the RNG results is perfectly valid\, giving you a completely random number that uses all bits in all places.

If the RNG cycle length shares prime factors with the number of times you call the RNG to construct a single random number\, then all you did is shorten the effective cycle by those factors. And by definition the length of the cycle is the upper bound on how many distinct numbers you can generate. So the result would be a much wider spread of many fewer distinct numbers.

If you’re throwing away any bits you got from the RNG\, you can also end up exposing a non-uniformity in the distribution that was not previously apparent. (I think there is also some risk of this happening in general but I’d have to bone up further on my math to substantiate this.)

Regards\, -- Aristotle Pagaltzis // \<http​://plasmasturm.org/>

p5pRT commented 11 years ago

From BitCard@ResonatorSoft.org

On Sat\, Dec 1\, 2012 at 2​:46 AM\, A. Pagaltzis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

(And you can always copy the work of someone who has already sweat the details. I won’t raise any concerns over that. I would like it in fact.)

I'm open to that as long as there isn't a noticeable slowdown to the original function\, and the RNG code isn't specific to a type of need.

Yes\, so don’t smear lipstick on it – fix it or leave it.

Agreed.

If the RNG cycle length shares prime factors with the number of times you call the RNG to construct a single random number\, then all you did is shorten the effective cycle by those factors. And by definition the length of the cycle is the upper bound on how many distinct numbers you can generate. So the result would be a much wider spread of many fewer distinct numbers.

I'll admit that I only know enough about RNGs to be dangerous\, but I'm still not seeing the problem here.

Let's say that I have a RNG that picks a single digit between 0 and 9. It has an equal chance (1​:10) to pick each digit. I call that 50 times\, append it to the growing number\, and then turn it into a float by putting a dot at the beginning.

Bulk88's approach is essentially the same thing\, except it's doing it in binary with binary boundaries.

If you’re throwing away any bits you got from the RNG\, you can also end up exposing a non-uniformity in the distribution that was not previously apparent. (I think there is also some risk of this happening in general but I’d have to bone up further on my math to substantiate this.)

Yes\, this is valid concern\, which is why it's important to do this on a binary boundary with the right information on the bits of entropy. If the number of values is\, say\, 0x00 to 0xFF\, and we try this on a 16-bit boundary\, then we would be adding extra binary zeros\, polluting the RNG.

-- Brendan Byrd \Perl@&#8203;ResonatorSoft\.org Brendan Byrd \BBYRD@&#8203;CPAN\.org

p5pRT commented 11 years ago

From @doy

On Sat\, Dec 01\, 2012 at 05​:31​:22AM -0800\, Brendan Byrd via RT wrote​:

On Sat\, Dec 1\, 2012 at 2​:46 AM\, A. Pagaltzis via RT \perlbug\-followup@&#8203;perl\.org wrote​:

(And you can always copy the work of someone who has already sweat the details. I won’t raise any concerns over that. I would like it in fact.)

I'm open to that as long as there isn't a noticeable slowdown to the original function\, and the RNG code isn't specific to a type of need.

Yes\, so don’t smear lipstick on it – fix it or leave it.

Agreed.

If the RNG cycle length shares prime factors with the number of times you call the RNG to construct a single random number\, then all you did is shorten the effective cycle by those factors. And by definition the length of the cycle is the upper bound on how many distinct numbers you can generate. So the result would be a much wider spread of many fewer distinct numbers.

I'll admit that I only know enough about RNGs to be dangerous\, but I'm still not seeing the problem here.

Let's say that I have a RNG that picks a single digit between 0 and 9. It has an equal chance (1​:10) to pick each digit. I call that 50 times\, append it to the growing number\, and then turn it into a float by putting a dot at the beginning.

Bulk88's approach is essentially the same thing\, except it's doing it in binary with binary boundaries.

If you’re throwing away any bits you got from the RNG\, you can also end up exposing a non-uniformity in the distribution that was not previously apparent. (I think there is also some risk of this happening in general but I’d have to bone up further on my math to substantiate this.)

Yes\, this is valid concern\, which is why it's important to do this on a binary boundary with the right information on the bits of entropy. If the number of values is\, say\, 0x00 to 0xFF\, and we try this on a 16-bit boundary\, then we would be adding extra binary zeros\, polluting the RNG.

How the values are distributed is another concern. A common issue with bad RNGs is that they have much less entropy in the low bits than in the high bits. This isn't typically noticeable when you use it in the way perl does\, since the low bits are only seen if you care about a lot of precision in the random floating point numbers (the first couple decimal places would still be reasonably distributed)\, but if you start appending values in this way\, it can move the problem up to a point where it could be noticeable (if\, say\, you took the values from the lowest two bits + a full 15 bits + a full 15 bits to get the full random number). These are the kinds of issues people are talking about. Using an existing\, already written and tested implementation instead would sidestep this whole issue.

-doy