David-Durst / aetherling

Create auto-scheduled data-parallel pipelines in hardware with user-friendly Python
MIT License
12 stars 1 forks source link

Linebuffer RAM Not Being Written To When CE On #17

Closed David-Durst closed 5 years ago

David-Durst commented 5 years ago

I'm having issues with the RAM in my linebuffer when I'm turning the CE off and on and on repeatedly.

In the chains of downsampling stencils that I'm generating from Haskell (https://github.com/David-Durst/aetherling/blob/master/tests/haskell/downsampleStencilChain1Per64.py), I'm enabling/disabling the CE according to the ready/valid signals. When the downstream stencil is ready and the upstream stencil is valid, I'm enabling the CE. Otherwise, I'm disabling it.

However, the linebuffers in these stencils are producing 0 incorrectly.

I've created a repro of the problem in this test: https://github.com/David-Durst/aetherling/blob/master/tests/helper_test_readyvalid.py#L18. The test runs my linebuffer with CE on every even clock and off every odd clock. As you can see in the below output from the step (see step i20 in particular), the RAM is never being written to even though it's WE is set.

I don't think this is an issue with Magma as my attempts to reproduce this bug using just the RAM primitive (https://github.com/David-Durst/aetherling/blob/d27f05846583749d05b5d58665950b3129dca559/tests/helper_test_readyvalid.py#L139-L209) have been unsuccessful. In those tests, the RAM accepts input when it's CE is enabled.

However, I'm stuck debugging and, @leonardt, if you have a chance, could you look at this and make sure I'm seeing this correctly? I recognize you're busy, but if you have a few spare cycles I would appreciate the help. Maybe I wired up my debugging ports wrong and my test output is incorrect?

Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
Numargs=1
In Run Generators
Done running generators
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=2
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=1
Numargs=2
Numargs=1
Numargs=1
Numargs=1
Found raddr
Found raddr
Starting topological sort
topo_order.size() = 276
numVertices(g)    = 276
i0
undelayed out: [0, 0, 0, 1]
out: [0, 0, 0, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 0, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i1
undelayed out: [0, 0, 0, 1]
out: [0, 0, 0, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 0, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i2
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i3
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i4
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i5
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i6
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i7
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i8
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i9
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i10
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i11
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i12
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i13
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i14
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i15
undelayed out: [0, 0, 1, 1]
out: [0, 0, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 0, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i16
undelayed out: [0, 1, 1, 1]
out: [0, 1, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i17
undelayed out: [0, 1, 1, 1]
out: [0, 1, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [0, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i18
undelayed out: [1, 1, 1, 1]
out: [1, 1, 1, 1]
valid: True
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i19
undelayed out: [1, 1, 1, 1]
out: [1, 1, 1, 1]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i20
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i21
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i22
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i23
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i24
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: True
db WE: True
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: True

i25
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: False
db CE: False
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 0
db WADDR: 0
db RAM WE: False

i26
undelayed out: [1, 1, 1, 1]
out: [0, 0, 0, 0]
valid: True
db CE: True
db WE: False
db RDATA: [0, 0, 0, 0]
db WDATA: [1, 1, 1, 1]
db RADDR: 1
db WADDR: 0
db RAM WE: False

F
tests/test_readyvalid.py:17 (test_2dlb_flicker_ce_with_2x2_stride)
(True != 0 or 0 != 0)

Expected :0 or 0 != 0)
Actual   :(True
 <Click to see difference>

def test_2dlb_flicker_ce_with_2x2_stride():
        scope = Scope()
        c = coreir.Context()
        cirb = CoreIRBackend(c)

        testcircuit = DefineTwoDimensionalLineBuffer(cirb, Array(8, In(Bit)), 1, 1, 2, 2, 8, 8, 2, 2, 0, 0, True)

        sim = CoreIRSimulator(testcircuit, testcircuit.CLK, context=c,
                              namespaces=["aetherlinglib", "commonlib", "mantle", "coreir", "global"])

        for i in range(100000):
            if i % 2 == 0:
                sim.set_value(testcircuit.I[0][0], int2seq(1, 8), scope)
                sim.set_value(testcircuit.CE, 1, scope)
            else:
                sim.set_value(testcircuit.I[0][0], int2seq(2, 8), scope)
                sim.set_value(testcircuit.CE, 0, scope)
            sim.evaluate()
            sim.advance_cycle()
            sim.evaluate()
            print("i" + str(i))
            print("undelayed out: " + str([seq2int(sim.get_value(testcircuit.undelayedO[0][r][c], scope)) for r in range(2) for c in range(2)]))
            print("out: " + str([seq2int(sim.get_value(testcircuit.O[0][r][c], scope)) for r in range(2) for c in range(2)]))
            print("valid: " + str(sim.get_value(testcircuit.valid, scope)))
            print("db CE: " + str(sim.get_value(testcircuit.dbCE, scope)))
            print("db WE: " + str(sim.get_value(testcircuit.dbWE, scope)))
            print("db RDATA: " + str([seq2int(sim.get_value(testcircuit.RDATA, scope)[0][r][c]) for r in range(2) for c in range(2)]))
            print("db WDATA: " + str([seq2int(sim.get_value(testcircuit.WDATA, scope)[0][r][c]) for r in range(2) for c in range(2)]))
            print("db RADDR: " + str(seq2int(sim.get_value(testcircuit.RADDR, scope)[0])))
            print("db WADDR: " + str(seq2int(sim.get_value(testcircuit.WADDR, scope)[0])))
            print("db RAM WE: " + str(sim.get_value(testcircuit.RAMWE, scope)))
            print("")
            print("")

            # for some reason, lb going to 0 when flickering valid on and off for ce
            for r in range(2):
                for c in range(2):
>                   assert (sim.get_value(testcircuit.valid, scope) == 0 or seq2int(sim.get_value(testcircuit.O[0][r][c], scope)) != 0)
E                   assert (True == 0 or 0 != 0)
E                    +  where True = <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>>(TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.valid, <magma.scope.Scope object at 0x1071c65f8>)
E                    +    where <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>> = <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>.get_value
E                    +    and   TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.valid = TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin = DefineCircuit("TwoD...E, TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.RAMWE)\nEndCircuit().valid
E                    +  and   0 = seq2int([False, False, False, False, False, False, ...])
E                    +    where [False, False, False, False, False, False, ...] = <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>>(TwoDimensionalLineBuffer_Array_8_In_Bit__type_1x1pxPerClock_2x2window_8x8img_2x2stride_0x0origin.O[0][0][0], <magma.scope.Scope object at 0x1071c65f8>)
E                    +      where <bound method CoreIRSimulator.get_value of <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>> = <magma.simulator.coreir_simulator.CoreIRSimulator object at 0x10759f160>.get_value

test_readyvalid.py:55: AssertionError
David-Durst commented 5 years ago

Also, the debug ports are wired: https://github.com/David-Durst/aetherling/blob/master/aetherling/modules/delayed_buffer.py#L77-L83 https://github.com/David-Durst/aetherling/blob/master/aetherling/modules/delayed_buffer.py#L113-L114 https://github.com/David-Durst/aetherling/blob/master/aetherling/modules/delayed_buffer.py#L144-L145

leonardt commented 5 years ago

It could be helpful to dump the coreir JSON to make sure everything is wired up properly once it's been compiled, but that depends on how readable the output is (not sure how big the design is).

David-Durst commented 5 years ago

It appears that there were a number of issues at play. @rdaly525 , @THofstee , and @rsetaluri have all provided a ton of help here. I will post a post moterm (and likely a bug report on fault/coreIR simulator) when this is all wrapped up. However, at this point I think a little more digging is necessary beforing making that post.

David-Durst commented 5 years ago

https://github.com/David-Durst/aetherling/commit/8450c2b8beb61008f42633b3212af20c5dbbe489 fixes this.