fvutils / pyvsc

Python packages providing a library for Verification Stimulus and Coverage
https://fvutils.github.io/pyvsc
Apache License 2.0
113 stars 26 forks source link

Random stability features needed #117

Closed msmftc closed 2 years ago

msmftc commented 3 years ago

PyVSC needs to add features to support random stability. Random stability is the idea that randomized tests should be exactly reproducible (given a fixed random seed), and that changes to a small feature of a test should not affect randomization outside of that feature.

PyVSC cannot support random stability right now because a single, global random number generator (RNG) supplies all random numbers. If a test has several randobj instances then the sequence of random values produced by a particular instance in a loop of randomize() calls depends on the order of ALL randomize() calls on ALL randobj's in the test. This causes a few problems:

A solution would be to provide methods that pass an RNG instance or a seed into a randobj. All randomization for that randobj would need to use its private RNG. I've examined PyVSC source code, and it looks like this solution has been started, but is not complete. Is there a plan to complete it?

I've written an example to demonstrate random (in)stability. The example has two different randobj's, and randomize() is called ten times on each. A command-line parameter alters the order of randomize() calls between the two randobj's. If the example was random stable, then the sequence of numbers from each randobj would always be the same, regardless of randomize() order. Since the test is not random stable, the sequences vary.

import sys
import random
import vsc

@vsc.randobj
class RandByte:
    def __init__(self):
        self.byte = vsc.rand_bit_t(8)

    def __str__(self):
        return("byte = %d".format(self.byte))

@vsc.randobj
class RandWord:
    def __init__(self):
        self.word = vsc.rand_bit_t(16)

    def __str__(self):
        return("word = %d".format(self.word))

def main():
    random.seed(100)   # randobj's should not depend on global seed, but they currently do.

    randByte = RandByte()   # randobj methods should set either an RNG instance or a seed. 
    randWord = RandWord()
    byte_l = []
    word_l = []

    # Order of randomize() calls should not affect random sequence from a particular
    # randobj, but it currently does.
    if(len(sys.argv) > 1 and sys.argv[1] == '1'):
        for i in range(10):
            randByte.randomize()
            randWord.randomize()
            byte_l.append(randByte.byte)
            word_l.append(randWord.word)
    elif(len(sys.argv) > 1 and sys.argv[1] == '2'):
        for i in range(10):
            randWord.randomize()
            word_l.append(randWord.word)
        for i in range(10):
            randByte.randomize()
            byte_l.append(randByte.byte)
    else:
        sys.exit("First command-line argument must be '1' or '2'.")

    print("Byte list")
    for i in range(10):
        print("  byte {}:  {}".format(i, byte_l[i]))
    print("Word list")
    for i in range(10):
        print("  word {}:  {}".format(i, word_l[i]))

main()
$ python randStability1.py 1
Byte list
  byte 0:  89
  byte 1:  104
  byte 2:  209
  byte 3:  63
  byte 4:  121
  byte 5:  175
  byte 6:  222
  byte 7:  112
  byte 8:  243
  byte 9:  19
Word list
  word 0:  26802
  word 1:  24719
  word 2:  52830
  word 3:  25249
  word 4:  37991
  word 5:  52656
  word 6:  42828
  word 7:  59728
  word 8:  13635
  word 9:  6471
$ python randStability1.py 2
Byte list
  byte 0:  175
  byte 1:  205
  byte 2:  222
  byte 3:  167
  byte 4:  112
  byte 5:  233
  byte 6:  243
  byte 7:  53
  byte 8:  19
  byte 9:  25
Word list
  word 0:  22904
  word 1:  26802
  word 2:  26721
  word 3:  24719
  word 4:  53699
  word 5:  52830
  word 6:  16265
  word 7:  25249
  word 8:  31093
  word 9:  37991
mballance commented 3 years ago

Hi @msmftc, I've been giving this more thought. Here are some design and requirement notes on this feature. Please review and let me know your thoughts.

Requirements

Operation

Applying this to your example should look something like the below:

import sys
import random
import vsc

@vsc.randobj
class RandByte:
    def __init__(self):
        self.byte = vsc.rand_bit_t(8)

    def __str__(self):
        return("byte = %d".format(self.byte))

@vsc.randobj
class RandWord:
    def __init__(self):
        self.word = vsc.rand_bit_t(16)

    def __str__(self):
        return("word = %d".format(self.word))

def main():
    rs = RandState(100) # Create a random state from a numeric seed

    randByte = RandByte()   # randobj methods should set either an RNG instance or a seed. 
    randByte.set_randstate(rs) # Sets the randstate to the current state of 'rs'
    randWord = RandWord()
    randWord.set_randstate(rs) # Sets the randstate to the current state of 'rs'

    # Note: because a copy of the randstate is taken, the subsequent randomizations
    # of 'randByte' and 'randWord' are independent. 
    # Note, also, that subsequent randomizations do not mutate the state held in 'rs'.

    byte_l = []
    word_l = []

    # Order of randomize() calls should not affect random sequence from a particular
    # randobj, but it currently does.
    if(len(sys.argv) > 1 and sys.argv[1] == '1'):
        for i in range(10):
            randByte.randomize()
            randWord.randomize()
            byte_l.append(randByte.byte)
            word_l.append(randWord.word)
    elif(len(sys.argv) > 1 and sys.argv[1] == '2'):
        for i in range(10):
            randWord.randomize()
            word_l.append(randWord.word)
        for i in range(10):
            randByte.randomize()
            byte_l.append(randByte.byte)
    else:
        sys.exit("First command-line argument must be '1' or '2'.")

    print("Byte list")
    for i in range(10):
        print("  byte {}:  {}".format(i, byte_l[i]))
    print("Word list")
    for i in range(10):
        print("  word {}:  {}".format(i, word_l[i]))

main()
msmftc commented 3 years ago

@mballance, these features look good, but I recommend adding a couple more options for seeding RandState objects. The goal of these options is to uniquely seed each RandState obj while preserving random stability.

First, there could be an option to seed RandObj with the combination of an integer and a string, for example:

rs = RandState(100, "core[0].worker[3].opc")

The integer and string are hashed together to produce a unique seed for the RandState. UVM uses this technique. The integer is generally a global seed that is set for the whole test, and the string is generally a hierarchical pathname for the object being seeded. Assuming every randomizable object has a unique pathname, then each gets a unique seed. When a test is altered, seeds stay constant for any object whose pathname is unchanged. The order in which objects are seeded does not affect randomization.

Second, RandState could have a method that returns random integers so that a RandState can be seeded from another RandState. For example:

rs1 = RandState(100)
rs2 = RandState(rs1.randInt())

This enables a hierarchical method of seeding RandObj's, similar to what SystemVerilog does. Imagine a test that uses many threads, and each thread has the ability to start additional threads. The test could be written such that the initial thread is given an integer seed. The thread creates a RandState with that seed (rs_top = RandState(seed)), then calls rs_top.randInt() to seed a new RandState for each RandObj in the thread. When the thread needs to start child threads, it again calls rs_top.randInt() to provide seeds to those children. This keeps randomization stable for all threads even though we cannot predict the program order in which grandchild threads will be started.

If you include these options in RandState then your plan should work well.

mballance commented 3 years ago

Hi @msmftc, I'm good with adding a 'randInt' method to RandState. That certainly makes sense.

I'm wondering if the string-based seed creation could be kept separate from PyVSC. PyVSC itself doesn't maintain a globally-unique instance path for all randomizable objects. That's somewhat dependent on the user's test environment, so the user will need to generate appropriate string paths anyway. I had a look at UVM seed management, and I could see using a similar approach (ie a singleton object that relates paths to seed values) in Python. Can you manage this in your environment -- at least to start?

Thanks, Matthew

msmftc commented 3 years ago

@mballance,

I'm not suggesting that PyVSC manage instance paths for randomizable objects. That is the user's responsibility. I'm recommending that RandState have a constructor or method that hashes an arbitrary integer with an arbitrary string to create a unique random seed. The rest of the work is up to the user (setting a global seed, tracking unique pathnames for each object).

Including the hashing in RandState saves users from needing to provide their own hash functions, which have potential to be implemented badly.

mballance commented 3 years ago

@msmftc, I'm good with that approach: provide a utility method on RandState for constructing a RandState object from the combination of an integer and string seed. Since it seems we're agreed on the specifics, I'll proceed to begin implementation.

mballance commented 2 years ago

Hi @msmftc, The 0.6.3 release contains an implementation of explicit random-stability management. There is a brief discussion of the new features in the documentation: https://pyvsc.readthedocs.io/en/latest/methods.html#managing-random-stability. However, it's probably well-worth taking a look at the new tests for the feature as well. For example, here is an example showing creation of a rand-state object from an integer seed and string component: https://github.com/fvutils/pyvsc/blob/02302b461c44079eaf46d77923cb3e7e98eec920/ve/unit/test_randstate.py#L52-L92

I'll leave this open for now until you're able to try things out.

-Matthew

msmftc commented 2 years ago

@mballance,

I've begun testing the random stability feature. I wrote a simple example test showing that PyVSC provides random repeatability within a single run, but is completely unrepeatable across multiple runs. I've included the example below.

I looked at the PyVSC source code in rand_state.py and found the likely root cause. In method mkFromSeed() you are seeding a RandState object with seed + hash(strval). In Python 3.3 and later, the hash() of a given string changes from run to run. We need a hash function that doesn't change.

The following example demonstrates repeatability within a run, unrepeatability across multiple runs, and different hash() results across multiple runs:

import vsc

@vsc.randobj
class Selector():
    def __init__(self):
        self.number = vsc.rand_bit_t(8)

contextName = "test[n1]/process"
selector = Selector()
for i in range(2):
    randState = vsc.RandState.mkFromSeed(1, contextName)
    selector.set_randstate(randState)

    numStr = f"Selection {i}:"
    for j in range(10):
        selector.randomize()
        numStr += f"\t{int(selector.number)}"

    print(numStr)

print(f"hash({contextName}):  {hash(contextName)}")

Run 1:

Selection 0:    93      135     136     153     228     129     12      48      244     153
Selection 1:    93      135     136     153     228     129     12      48      244     153
hash(test[n1]/process):  -714692566165798307

Run 2:

Selection 0:    120     24      164     207     60      186     195     30      58      58
Selection 1:    120     24      164     207     60      186     195     30      58      58
hash(test[n1]/process):  4775314088417376212

An alternative method for getting an integer seed in mkFromSeed() could be to use a random.Random() instance, which can be seeded with any string. For example:

def mkFromSeed(cls, seed, strVal=None):
    if strval is not None:
        rng = random.Random()
        rng.seed(f"{seed} + {strVal}")
        seed = rng.randInt(0, 0xFFFFFFFF)
    return RandState(seed)

However, I think there is a better way to implement the RandState class. Rather than using your own XOR and shift operations to produce random integers, you could build RandState around an instance of random.Random() and take advantage of its pertinent methods: seed(), getstate(), setstate(), and randint(). That would take advantage of the Mersenne Twister RNG algorithm, which is well-tested. A possible downside to this approach is memory consumption - I don't know how much memory each random.Random() instance consumes, but it could be several kilobytes.

I could contribute code if you need help.

mballance commented 2 years ago

Hi @msmftc, I had overlooked that hash(str) isn't deterministic across runs. I'm certainly open to using Random() as the core of RandState. Early on with PyVSC, I faced some random-stability issues that I thought were connected to using multiple Random instances (which is why I moved to using only a single instance). However, those issues may have been due to use of other features.

If you're open to contributing code to use 'Random' as the core of RandState, I'll be happy to accept it.

msmftc commented 2 years ago

I've submitted a pull request that implements RandState with Random() objects providing random number generation. I've tested it in my private code and it fixes the random stability bug.

mballance commented 2 years ago

Thanks, @msmftc. I've merged your PR, and started the release process for 0.6.5. I'll close this issue for now.