eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
429 stars 179 forks source link

seeded random-number-generator results should be reproducible across executions and instances #3912

Open line-o opened 3 years ago

line-o commented 3 years ago

Describe the bug

When fn:random-number-generator#1 is given any atomic value as a seed the resulting sequence of random numbers should be reproducible between two invocations.

Given fn:random-number-generator(1) in existdb 5.3.0-SNAPSHOT the first random number that is returned is always, predictably 5.31250253323421e-1

Moreover that is true for all following numbers as well. It is just unpredictable which number comes next until this was evaluated.

[
  fn:random-number-generator(1)?number,
  fn:random-number-generator(1)?next()?number
]

should evaluate to (at least for a specific version of existdb)

[5.31250253323421e-1, 5.313156098200126e-1]

This is very important to a whole range of applications working with seeded randoms. For testability, for generative algorithms and more.

While working on dicey I found a range of seeds where the assumptions above do not hold.

affected classes of seeds:

BONUS: fn:random-number-generator(0) (xs:integer zero as seed) will only return zeros!

Expected behavior

The outcome of a random-number-generator seeded with any of the above to produce predictable random values across code executions and instances of existdb (at least of the same version).

To Reproduce

xquery version "3.1";

module namespace fn-rng="http://exist-db.org/xquery/test/fnRandomNumberGenerator";

declare namespace test="http://exist-db.org/xquery/xqsuite";

declare variable $fn-rng:long-seed := 123456789;
declare variable $fn-rng:text-seed := 'sample seed';
declare variable $fn-rng:date-seed := xs:date('1970-01-01');
declare variable $fn-rng:dateTime-seed := xs:dateTime('1970-01-01T00:00:00.000Z');
declare variable $fn-rng:number-as-string-seed := "1234567890";
declare variable $fn-rng:long-string-seed := "When life hands you dirt plant seeds.";
declare variable $fn-rng:big-number-seed := 9223372036854775808;
declare variable $fn-rng:zero-seed := 0;

(: special seeds  :)

(:~
 : the probability of the first one hundred consecutive zeros in a sequence of random
 : decimals between 0 and 1 is very low
 :)
declare
    %test:assertNotEquals(0)
function fn-rng:seed-zero-yields-random-values () {
    let $random := fn:random-number-generator($fn-rng:zero-seed)
    return
        fold-left(1 to 100,
            map {"sequence": (), "generator": $random},
            function ($result, $ignore) {
                map {
                    "sequence": ($result?sequence, $result?generator?number),
                    "generator": $result?generator?next()
                }
            })
        => map:get("sequence")
        => sum()
};

(:~
 : This test documents an implementation detail.
 : It is an interesting fact to know that 123456789 and "123456789" are treated as the same seed.
 : It is also important to know, if the behaviour of the implementation changes in this regard.
 :)
declare
    %test:assertTrue
function fn-rng:seed-number () {
    fn:random-number-generator($fn-rng:long-seed)?number eq
    fn:random-number-generator($fn-rng:number-as-string-seed)?number
};

declare 
    %test:assertTrue
function fn-rng:seed-text-predictable () {
    fn:random-number-generator($fn-rng:text-seed)?number eq
    fn:random-number-generator($fn-rng:text-seed)?number
};

declare
    %test:assertTrue
function fn-rng:seed-date-predictable () {
    fn:random-number-generator($fn-rng:date-seed)?number eq
    fn:random-number-generator($fn-rng:date-seed)?number
};

declare
    %test:assertTrue
function fn-rng:seed-dateTime-predictable () {
    fn:random-number-generator($fn-rng:dateTime-seed)?number eq
    fn:random-number-generator($fn-rng:dateTime-seed)?number
};

declare
    %test:assertTrue
function fn-rng:seed-current-dateTime-predictable () {
    fn:random-number-generator(fn:current-dateTime())?number eq
    fn:random-number-generator(fn:current-dateTime())?number
};

declare
    %test:assertTrue
function fn-rng:deterministic () {
    fn:random-number-generator($fn-rng:long-seed)?number eq 
    fn:random-number-generator($fn-rng:long-seed)?number
};

declare
    %test:assertTrue
function fn-rng:deterministic-next () {
    fn:random-number-generator($fn-rng:long-seed)?next()?number eq 
    fn:random-number-generator($fn-rng:long-seed)?next()?number
};

declare
    %test:assertTrue
function fn-rng:deterministic-long-string-seed () {
    fn:random-number-generator($fn-rng:long-string-seed)?number eq
    fn:random-number-generator($fn-rng:long-string-seed)?number
};

declare
    %test:assertTrue
function fn-rng:deterministic-big-number-seed () {
    fn:random-number-generator($fn-rng:big-number-seed)?number eq
    fn:random-number-generator($fn-rng:big-number-seed)?number
};

declare
    %test:assertTrue
function fn-rng:deterministic-big-number-seed-minus-one () {
    fn:random-number-generator($fn-rng:big-number-seed -1)?number eq
    fn:random-number-generator($fn-rng:big-number-seed -1)?number
};

declare
    %test:assertTrue
function fn-rng:deterministic-decimal-seed () {
    fn:random-number-generator(1.1)?number eq
    fn:random-number-generator(1.1)?number
};

declare
    %test:assertTrue
function fn-rng:decimal-seeds-yield-other-results () {
    (
        fn:random-number-generator(1.1)?number,
        fn:random-number-generator(1.11)?number,
        fn:random-number-generator(1.111)?number,
        fn:random-number-generator(1.1111)?number
    )
    != fn:random-number-generator(1)?number
};

Context (please always complete the following information):

Additional context

Specification

https://www.w3.org/XML/Group/qtspecs/specifications/xpath-functions-31/html/Overview.html#func-random-number-generator

It is recommended that when the same seed is provided explicitly, the same random number sequence should be delivered even in different execution scopes; while if no seed is provided, the processor should choose a seed that is likely to be different from one execution scope to another. (The same effect can be achieved explicitly by using fn:current-dateTime() as a seed.)

joewiz commented 3 years ago

Can you comment on whether your expectations are rooted in the spec (relevant language, links), or whether the spec itself is, er, underspecified? Have you tested other implementations?

line-o commented 3 years ago

Good point @joewiz I added the relevant quote from the xquery 3.1 spec to the issue description. So I will change this to a feature as it is only recommended to be predictable between executions.

line-o commented 3 years ago

I want to state that the recommended behaviour is implemented for seeds with xs:integer values between 1 and 9223372036854775807.

line-o commented 3 years ago

I have not had a look at negative numeric values yet.