Closed msmftc closed 2 years ago
Hi @msmftc, I've been giving this more thought. Here are some design and requirement notes on this feature. Please review and let me know your thoughts.
RandState
object for capturing and propagating random stateRandState
from a numeric seedRandState
for a randomizable object RandState
of a randomizable object (assume state mutates as randomizations occur)set_randstate
and get_randstate
method. random
package as the seed.RandState
provides a 'clone' method that supports saving a copy of the random state for later use.Applying this to your example should look something like the below:
import sys
import random
import vsc
@vsc.randobj
class RandByte:
def __init__(self):
self.byte = vsc.rand_bit_t(8)
def __str__(self):
return("byte = %d".format(self.byte))
@vsc.randobj
class RandWord:
def __init__(self):
self.word = vsc.rand_bit_t(16)
def __str__(self):
return("word = %d".format(self.word))
def main():
rs = RandState(100) # Create a random state from a numeric seed
randByte = RandByte() # randobj methods should set either an RNG instance or a seed.
randByte.set_randstate(rs) # Sets the randstate to the current state of 'rs'
randWord = RandWord()
randWord.set_randstate(rs) # Sets the randstate to the current state of 'rs'
# Note: because a copy of the randstate is taken, the subsequent randomizations
# of 'randByte' and 'randWord' are independent.
# Note, also, that subsequent randomizations do not mutate the state held in 'rs'.
byte_l = []
word_l = []
# Order of randomize() calls should not affect random sequence from a particular
# randobj, but it currently does.
if(len(sys.argv) > 1 and sys.argv[1] == '1'):
for i in range(10):
randByte.randomize()
randWord.randomize()
byte_l.append(randByte.byte)
word_l.append(randWord.word)
elif(len(sys.argv) > 1 and sys.argv[1] == '2'):
for i in range(10):
randWord.randomize()
word_l.append(randWord.word)
for i in range(10):
randByte.randomize()
byte_l.append(randByte.byte)
else:
sys.exit("First command-line argument must be '1' or '2'.")
print("Byte list")
for i in range(10):
print(" byte {}: {}".format(i, byte_l[i]))
print("Word list")
for i in range(10):
print(" word {}: {}".format(i, word_l[i]))
main()
@mballance, these features look good, but I recommend adding a couple more options for seeding RandState objects. The goal of these options is to uniquely seed each RandState obj while preserving random stability.
First, there could be an option to seed RandObj with the combination of an integer and a string, for example:
rs = RandState(100, "core[0].worker[3].opc")
The integer and string are hashed together to produce a unique seed for the RandState. UVM uses this technique. The integer is generally a global seed that is set for the whole test, and the string is generally a hierarchical pathname for the object being seeded. Assuming every randomizable object has a unique pathname, then each gets a unique seed. When a test is altered, seeds stay constant for any object whose pathname is unchanged. The order in which objects are seeded does not affect randomization.
Second, RandState could have a method that returns random integers so that a RandState can be seeded from another RandState. For example:
rs1 = RandState(100)
rs2 = RandState(rs1.randInt())
This enables a hierarchical method of seeding RandObj's, similar to what SystemVerilog does. Imagine a test that uses many threads, and each thread has the ability to start additional threads. The test could be written such that the initial thread is given an integer seed. The thread creates a RandState with that seed (rs_top = RandState(seed)
), then calls rs_top.randInt()
to seed a new RandState for each RandObj in the thread. When the thread needs to start child threads, it again calls rs_top.randInt()
to provide seeds to those children. This keeps randomization stable for all threads even though we cannot predict the program order in which grandchild threads will be started.
If you include these options in RandState then your plan should work well.
Hi @msmftc, I'm good with adding a 'randInt' method to RandState. That certainly makes sense.
I'm wondering if the string-based seed creation could be kept separate from PyVSC. PyVSC itself doesn't maintain a globally-unique instance path for all randomizable objects. That's somewhat dependent on the user's test environment, so the user will need to generate appropriate string paths anyway. I had a look at UVM seed management, and I could see using a similar approach (ie a singleton object that relates paths to seed values) in Python. Can you manage this in your environment -- at least to start?
Thanks, Matthew
@mballance,
I'm not suggesting that PyVSC manage instance paths for randomizable objects. That is the user's responsibility. I'm recommending that RandState have a constructor or method that hashes an arbitrary integer with an arbitrary string to create a unique random seed. The rest of the work is up to the user (setting a global seed, tracking unique pathnames for each object).
Including the hashing in RandState saves users from needing to provide their own hash functions, which have potential to be implemented badly.
@msmftc, I'm good with that approach: provide a utility method on RandState for constructing a RandState object from the combination of an integer and string seed. Since it seems we're agreed on the specifics, I'll proceed to begin implementation.
Hi @msmftc, The 0.6.3 release contains an implementation of explicit random-stability management. There is a brief discussion of the new features in the documentation: https://pyvsc.readthedocs.io/en/latest/methods.html#managing-random-stability. However, it's probably well-worth taking a look at the new tests for the feature as well. For example, here is an example showing creation of a rand-state object from an integer seed and string component: https://github.com/fvutils/pyvsc/blob/02302b461c44079eaf46d77923cb3e7e98eec920/ve/unit/test_randstate.py#L52-L92
I'll leave this open for now until you're able to try things out.
-Matthew
@mballance,
I've begun testing the random stability feature. I wrote a simple example test showing that PyVSC provides random repeatability within a single run, but is completely unrepeatable across multiple runs. I've included the example below.
I looked at the PyVSC source code in rand_state.py and found the likely root cause. In method mkFromSeed()
you are seeding a RandState object with seed + hash(strval)
. In Python 3.3 and later, the hash()
of a given string changes from run to run. We need a hash function that doesn't change.
The following example demonstrates repeatability within a run, unrepeatability across multiple runs, and different hash() results across multiple runs:
import vsc
@vsc.randobj
class Selector():
def __init__(self):
self.number = vsc.rand_bit_t(8)
contextName = "test[n1]/process"
selector = Selector()
for i in range(2):
randState = vsc.RandState.mkFromSeed(1, contextName)
selector.set_randstate(randState)
numStr = f"Selection {i}:"
for j in range(10):
selector.randomize()
numStr += f"\t{int(selector.number)}"
print(numStr)
print(f"hash({contextName}): {hash(contextName)}")
Run 1:
Selection 0: 93 135 136 153 228 129 12 48 244 153
Selection 1: 93 135 136 153 228 129 12 48 244 153
hash(test[n1]/process): -714692566165798307
Run 2:
Selection 0: 120 24 164 207 60 186 195 30 58 58
Selection 1: 120 24 164 207 60 186 195 30 58 58
hash(test[n1]/process): 4775314088417376212
An alternative method for getting an integer seed in mkFromSeed()
could be to use a random.Random()
instance, which can be seeded with any string. For example:
def mkFromSeed(cls, seed, strVal=None):
if strval is not None:
rng = random.Random()
rng.seed(f"{seed} + {strVal}")
seed = rng.randInt(0, 0xFFFFFFFF)
return RandState(seed)
However, I think there is a better way to implement the RandState class. Rather than using your own XOR and shift operations to produce random integers, you could build RandState around an instance of random.Random()
and take advantage of its pertinent methods: seed(), getstate(), setstate(), and randint(). That would take advantage of the Mersenne Twister RNG algorithm, which is well-tested. A possible downside to this approach is memory consumption - I don't know how much memory each random.Random() instance consumes, but it could be several kilobytes.
I could contribute code if you need help.
Hi @msmftc, I had overlooked that hash(str) isn't deterministic across runs. I'm certainly open to using Random() as the core of RandState. Early on with PyVSC, I faced some random-stability issues that I thought were connected to using multiple Random instances (which is why I moved to using only a single instance). However, those issues may have been due to use of other features.
If you're open to contributing code to use 'Random' as the core of RandState, I'll be happy to accept it.
I've submitted a pull request that implements RandState with Random() objects providing random number generation. I've tested it in my private code and it fixes the random stability bug.
Thanks, @msmftc. I've merged your PR, and started the release process for 0.6.5. I'll close this issue for now.
PyVSC needs to add features to support random stability. Random stability is the idea that randomized tests should be exactly reproducible (given a fixed random seed), and that changes to a small feature of a test should not affect randomization outside of that feature.
PyVSC cannot support random stability right now because a single, global random number generator (RNG) supplies all random numbers. If a test has several randobj instances then the sequence of random values produced by a particular instance in a loop of randomize() calls depends on the order of ALL randomize() calls on ALL randobj's in the test. This causes a few problems:
A solution would be to provide methods that pass an RNG instance or a seed into a randobj. All randomization for that randobj would need to use its private RNG. I've examined PyVSC source code, and it looks like this solution has been started, but is not complete. Is there a plan to complete it?
I've written an example to demonstrate random (in)stability. The example has two different randobj's, and randomize() is called ten times on each. A command-line parameter alters the order of randomize() calls between the two randobj's. If the example was random stable, then the sequence of numbers from each randobj would always be the same, regardless of randomize() order. Since the test is not random stable, the sequences vary.