Loss of random distribution in if/then/else constraint

msmftc commented 3 years ago

Below is an example program that constrains the property "disp" to the range [0, 4096]. However, the generated value is always exactly 4096. This seems related to the if/then/else constraint. If I remove if/then/else and just constrain that disp <= 4096, then the program generates a reasonable distribution.

import vsc

@vsc.randobj
class BranchInstr:
    def __init__(self):
        self.type = vsc.rand_bit_t(1)
        self.disp = vsc.rand_bit_t(22)

    @vsc.constraint
    def short_offset_cnstr(self):
        with vsc.if_then(self.type == 0):
            self.disp <= 4096
        with vsc.else_then:
            self.disp <= 4096

    def __str__(self):
        return(f"type = {self.type}, displacement = {self.disp}")

branchInstr = BranchInstr()
for i in range(32):
    branchInstr.randomize()
    print(branchInstr)

type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 0, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096
type = 1, displacement = 4096

Now, without the if/then/else:

import vsc

@vsc.randobj
class BranchInstr:
    def __init__(self):
        self.type = vsc.rand_bit_t(1)
        self.disp = vsc.rand_bit_t(22)

    @vsc.constraint
    def short_offset_cnstr(self):
        self.disp <= 4096

    def __str__(self):
        return(f"type = {self.type}, displacement = {self.disp}")

branchInstr = BranchInstr()
for i in range(32):
    branchInstr.randomize()
    print(branchInstr)

type = 0, displacement = 591
type = 1, displacement = 2415
type = 0, displacement = 2851
type = 0, displacement = 2186
type = 1, displacement = 1936
type = 1, displacement = 280
type = 0, displacement = 1236
type = 0, displacement = 2130
type = 0, displacement = 2114
type = 0, displacement = 1769
type = 0, displacement = 1330
type = 0, displacement = 324
type = 0, displacement = 3939
type = 0, displacement = 407
type = 0, displacement = 2126
type = 0, displacement = 2928
type = 1, displacement = 2507
type = 1, displacement = 2777
type = 1, displacement = 3435
type = 0, displacement = 430
type = 0, displacement = 1888
type = 0, displacement = 3048
type = 0, displacement = 2496
type = 0, displacement = 18
type = 1, displacement = 2309
type = 0, displacement = 1763
type = 1, displacement = 693
type = 1, displacement = 2654
type = 0, displacement = 6
type = 0, displacement = 2660
type = 0, displacement = 2940
type = 1, displacement = 768

My PyVSC version is pyvsc-0.5.5.20210822.1

msmftc commented 3 years ago

Some of my program's indentation was lost during copy-and-paste. Sorry about that. The program is simple enough that you can infer the original indentation.

mballance commented 3 years ago

Hi @msmftc, I've been making improvements to the randomization scheme, and believe that what is present in 0.5.9 results in a good distribution. I'll leave this open for a bit in case you have near-term feedback.

Best Regards, Matthew

msmftc commented 3 years ago

@mballance,

This is a good improvement. The original example (above) now produces a good distribution, as do several other tests that I wrote. All parts of the random spaces are sampled.

However, I do have a test where the distribution is unexpectedly skewed. I've included the code and it's coverage results below. In this test the random property "disp" should be evenly distributed across the range [0 - 4095], and also evenly distributed across the range [8192 - 16383]. Instead, the sub-range [8192 - 12291] is sampled about 3x more often than the sub-range [12292 - 16383]. Do you understand why this happens? Is there a reasonable fix to get better distribution?

import vsc

@vsc.randobj
class BranchInstr:
    def __init__(self):
        self.type = vsc.rand_bit_t(1)
        self.disp = vsc.rand_bit_t(22)

    @vsc.constraint
    def short_offset_cnstr(self):
        with vsc.if_then(self.type == 0):
            self.disp < 4096
        with vsc.else_then:
            self.disp >= 8192
            self.disp < 16384

    def __str__(self):
        return(f"type = {self.type}, displacement = {self.disp}")

@vsc.covergroup
class BranchInstr_cg:
    def __init__(self):
        self.with_sample(item = BranchInstr())

        self.type = vsc.coverpoint(self.item.type)

        self.disp_cp = vsc.coverpoint(self.item.disp, bins = {
            "type0_disp" : vsc.bin_array([16], [0, 4095]),
            "type1_disp" : vsc.bin_array([32], [8192, 16383])
        })

branchInstr = BranchInstr()
branchInstr_cg = BranchInstr_cg()

for i in range(1024):
    branchInstr.randomize()
    branchInstr_cg.sample(branchInstr)
    # print(branchInstr)

vsc.report_coverage(details=True)

TYPE BranchInstr_cg : 100.000000%
    CVP type : 100.000000%
    Bins:
        type[0] : 481
        type[1] : 543
    CVP disp_cp : 100.000000%
    Bins:
        type0_disp[0] : 16
        type0_disp[1] : 33
        type0_disp[2] : 29
        type0_disp[3] : 25
        type0_disp[4] : 33
        type0_disp[5] : 27
        type0_disp[6] : 39
        type0_disp[7] : 31
        type0_disp[8] : 25
        type0_disp[9] : 27
        type0_disp[10] : 23
        type0_disp[11] : 40
        type0_disp[12] : 38
        type0_disp[13] : 32
        type0_disp[14] : 29
        type0_disp[15] : 34
        type1_disp[0] : 30
        type1_disp[1] : 34
        type1_disp[2] : 28
        type1_disp[3] : 32
        type1_disp[4] : 23
        type1_disp[5] : 20
        type1_disp[6] : 26
        type1_disp[7] : 19
        type1_disp[8] : 24
        type1_disp[9] : 26
        type1_disp[10] : 35
        type1_disp[11] : 21
        type1_disp[12] : 34
        type1_disp[13] : 32
        type1_disp[14] : 23
        type1_disp[15] : 28
        type1_disp[16] : 8
        type1_disp[17] : 4
        type1_disp[18] : 8
        type1_disp[19] : 3
        type1_disp[20] : 5
        type1_disp[21] : 9
        type1_disp[22] : 7
        type1_disp[23] : 9
        type1_disp[24] : 8
        type1_disp[25] : 2
        type1_disp[26] : 8
        type1_disp[27] : 4
        type1_disp[28] : 8
        type1_disp[29] : 9
        type1_disp[30] : 11
        type1_disp[31] : 5

qzcx commented 3 years ago

I'm guessing it is not ordering the solve order here like you would expect. So half the time it solves disp first and always uses the higher range since type is unsolved. The other half of the time it does the coin flip. This would lead to the 1:3 ratio mentions.

This is just speculation, but maybe specifying the solve order would help fix it

msmftc commented 3 years ago

@qzcx, I ran further experiments with explicit solve order, but that did not change the skewed distribution. I tried both vsc.solve_order(self.type, self.disp) and vsc.solve_order(self.disp, self.type).

@mballance, I think you have fixed the original issue, even if there is still a lesser issue that causes skewed distributions. You can close this issue if you wish.

mballance commented 3 years ago

@qzcx, I'll file a separate issue for the new testcase, since it is worth digging into. I understand why the distribution is skewed, and can make a meaningful improvement to the results. That said, I'm still thinking whether there's a more-general approach that might better.

In case you're interested, here's an explanation of what's happening and what I changed. The new randomization-swizzling code introduces randomness by constraining bit ranges to be equal to the corresponding bits in a random value. Up to 6 ranges are formed. Any value-range constraints that conflict are dropped (ie they're soft constraints).

In the example, the 'dist' field is 22 bits wide. With 6 ranges, there are 5 ranges with 3 bits and one range with 7 bits. The initial algorithm always placed the 'overflow' range on the upper bits. In this case, that's the range that is most likely to not match the randomly-selected value. In that case, the constraint is dropped and Boolector will typically select '0' for those bits.

I've experimenting with randomly placing the 'overflow' range on the upper bits and on the lower bits. This results in an improved distribution on this example, but I think there might be a more-general approach.

Thanks again for raising this case!

fvutils / pyvsc

Loss of random distribution in if/then/else constraint #107