cocotb / cocotb

cocotb, a coroutine based cosimulation library for writing VHDL and Verilog testbenches in Python
https://www.cocotb.org
BSD 3-Clause "New" or "Revised" License
1.74k stars 502 forks source link

Clock Stopping #2653

Closed raysalemi closed 3 years ago

raysalemi commented 3 years ago

I'm using the cocotb Clock class to drive the clock in my design. Which I fork in this test:

@cocotb.test()
async def test_alu(dut):
    clock = Clock(dut.clk, 2, units="us")
    cocotb.fork(clock.start())
    proxy = CocotbProxy(dut)
    ConfigDB().set(None, "*", "PROXY", proxy)
    await proxy.reset()
    cocotb.fork(proxy.driver_bfm())
    cocotb.fork(proxy.cmd_mon_bfm())
    cocotb.fork(proxy.result_mon_bfm())
    await FallingEdge(dut.clk)
    await uvm_root().run_test("AluTest")

However, there is a point in my simulation where I await putting something in a queue and the clock stops. This happens inside the run_test coroutine at the end of the test.

Why would an await in the run_test stop the clock from running? This is the code that stops the sim:

    async def send_op(self, aa, bb, op):
        print(f"PUTTING: aa:{aa:x} bb:{bb:x}, op:{op:x}")
        await self.driver_queue.put((aa, bb, op))

And here is where the queue should be getting emptied but since the clock has stopped I'm not doing the read.

      while  True:
            await FallingEdge(self.dut.clk)
            if self.dut.start.value == 0 and self.dut.done.value == 0:
                try:
                    (aa, bb, op) = self.driver_queue.get_nowait()
                except QueueEmpty:
                   pass

Any thoughts on what I'm doing wrong?

raysalemi commented 3 years ago

I tried forking uvm_root().run_test("AluTest") instead of awaiting it. But got the same hang.

ktbarrett commented 3 years ago

Are you sure it's hanging and it's really crashing or finishing without printing an error message? If you fork that last statement, the test should end immediately. Does run_test use threads, is it deadlocking? I would use a remote Python debugger like pdb_attach.

raysalemi commented 3 years ago

Are you sure it's hanging and it's really crashing or finishing without printing an error message? If you fork that last statement, the test should end immediately. Does run_test use threads, is it deadlocking? I would use a remote Python debugger like pdb_attach.

All my code runs and returns, but when I try to wait for the end of sim by counting clocks I hang. My $monitor in Verilog shows that the clock has stopped, or at least it stops printing to the screen. I need to suspend the job and kill it.

Is there any way cocotb is stopping an HDL-generated clock?

   async def run_phase(self):
        self.raise_objection()
        seqr = ConfigDB().get(self, "", "SEQR")
        dut = ConfigDB().get(self,"","DUT")
        seq = AluSeq("seq")
        await seq.start(seqr)
        print("Counting Clock Cycles")
        await ClockCycles(dut.clk, 10)
        print("ClockCycles done")
        self.drop_objection()

Here's the output. The last thing printed is Counting Clock Cycles:

reset_n: z  clk: 0 A: zz B:zz, op: z start: z, done x
reset_n: z  clk: 1 A: zz B:zz, op: z start: z, done x
reset_n: 0  clk: 0 A: 0 B:0, op: 0 start: z, done x
reset_n: 0  clk: 1 A: 0 B:0, op: 0 start: z, done 0
reset_n: 1  clk: 0 A: 0 B:0, op: 0 start: z, done 0
reset_n: 1  clk: 1 A: 0 B:0, op: 0 start: z, done 0
AWAITING FALLING EDGE 0
finish_item -> cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
Sending command: cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
PUTTING: aa:12 bb:b7, op:1
HAVE PUT THE COMMAND
DEBUG: tinyalu_uvm.py(47)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
PAST ITEM DONE
finish_item done -> cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
Counting Clock Cycles
^Z
zsh: suspended  make
(base) raysalemi@RayProMac TinyALU % kill -9 %1 %2 %3 %4 %5
eric-wieser commented 3 years ago

Do you have a complete example somewhere? It's hard to debug this type of thing when your examples are full of calls to functions not in your snippet.

raysalemi commented 3 years ago

Do you have a complete example somewhere? It's hard to debug this type of thing when your examples are full of calls to functions not in your snippet.

Yes. Thanks for taking a look!

It is here: https://github.com/pyuvm/pyuvm/tree/broken

It' s in examples/TinyALU and after you set PYTHONPATH to the pyuvm directory make should simply demonstrate the hang.

I'm running with Icarus on the Mac.

raysalemi commented 3 years ago

This code seems to demonstrate that the clock only runs if there is a cocotb Trigger monitoring it

@cocotb.test()
async def stuck(dut):
    """does nothing but sleep"""
    await FallingEdge(dut.clk)
    dut.reset_n <= 0
    dut.A <= 0
    dut.B <= 0
    dut.op <= 0
    await FallingEdge(dut.clk)
    dut.reset_n <= 1
    await FallingEdge(dut.clk)
    print("clock fall 1")
    await FallingEdge(dut.clk)
    print ("clock fall 2")
    await ClockCycles(dut.clk, 10)
    cocotb.fork(asleep())
    print("AT END")

The clock stops while the asleep() coroutine is running even if it is forked.

async def asleep(sec=5):
    time.sleep(sec)

I think that the simulation stops unless there is a cocotb.trigger waiting on the clock. In my case I am blocked awaiting a queue.put() and expecting the clock to run and empty the queue so I can proceed.

Is this expected behavior?

I'll try to create a simple test case.

eric-wieser commented 3 years ago

You should not use time.sleep in a cocotb coroutine; it pauses all coroutines until the sleep finishes. Additionally, it rarely makes sense to sleep for wall time in a simulation; surely you want to sleep for simulation time?

raysalemi commented 3 years ago

I've created a test case with a producer and consumer and I think I've found my problem:

This version of the test case hangs:

@cocotb.test()
async def stuck(dut):
    """ test clock hang"""
    pc = ProdCon(dut)
    await pc.producer()
    await pc.consumer()

This version works:

@cocotb.test()
async def stuck(dut):
    """ test clock hang"""
    pc = ProdCon(dut)
    cocotb.fork(pc.producer())
    cocotb.fork(pc.consumer())
    await cocotb.triggers.ClockCycles(dut.clk, 10)

I think I imagined the await statement give way to the scheduler when it blocked. But that is clearly not what happens, and when you think about it it doesn't happen that way with SystemVerilog tasks either.

I suspect the problem is that I need to forking something that I am awaiting.

raysalemi commented 3 years ago

That's the rub actually. I have everything forked and still the consuming thread is not consuming in my larger testbench.

raysalemi commented 3 years ago

I finally discovered a bug deep in the pyuvm where I had called the cocotb queue with the block=False argument instead of using no_wait.

ktbarrett commented 3 years ago

async/await is co-operative scheduling, meaning: if a thread blocks a coroutine from reaching an await statement, cocotb's scheduler cannot switch tasks to allow the clock task to run. There are very few reasons for calls to threaded API in cocotb. time.sleep is very wrong here and has no place in cocotb tests.

raysalemi commented 3 years ago

async/await is co-operative scheduling, meaning: if a thread blocks a coroutine from reaching an await statement, cocotb's scheduler cannot switch tasks to allow the clock task to run. There are very few reasons for calls to threaded API in cocotb. time.sleep is very wrong here and has no place in cocotb tests.

It was actually just an example. I don't have it in my actual code. Which, sadly is still hanging. I don't understand why because all the coroutines are forked. Yet still it hangs on a queue.put.

raysalemi commented 3 years ago

The uvm_root().run_test fork is relying upon the driver_bfm fork to empty the queue so it can unblock. But the simulator seems to freeze up in the run_test fork.

    cocotb.fork(proxy.driver_bfm())
    cocotb.fork(proxy.cmd_mon_bfm())
    cocotb.fork(proxy.result_mon_bfm())
    cocotb.fork(uvm_root().run_test("AluTest"))
    await proxy.done.wait()
marlonjames commented 3 years ago

Using a remote debugger should help you trace the execution of things. While it is hanging you can break into the debugger and see what is running. You can see this answer in a related discussion topic for some options, including pdb_attach as mentioned earlier.

raysalemi commented 3 years ago

Yes. I'm seeing that the driver_bfm is never seeing the negative clock edge it needs. If I comment out the run_test the clock works normally.

How would a forked coroutine block the simulator?

raysalemi commented 3 years ago

I've got it boiled down to this. The clock goes high, and then it waits for my forked coroutine to finish running before it goes low again. That means if my code waits for the clock it hangs, but I should be able to wait for the clock, right?

Why would the simulator treat my coroutine as something that has to return before it will continue going?

CLOCK TOGGLED 1
CLOCK TOGGLED 0
CLOCK TOGGLED 1
CLOCK TOGGLED 0
CLOCK TOGGLED 1
WAITING FOR FALLING EDGE
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0x9c OP: ADD (1) B: 0x1a
FINISHED ITEM
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0xf8 OP: AND (2) B: 0xee
FINISHED ITEM
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0x16 OP: XOR (3) B: 0x82
FINISHED ITEM
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0xfd OP: MUL (4) B: 0x45
FINISHED ITEM
CLOCK TOGGLED 0
raysalemi commented 3 years ago

If I replace my test with this then it works:

    async def put_stuff(self):
        await self.driver_queue.put((1,1,1))
        await self.driver_queue.put((2,2,3))
        await self.driver_queue.put((3,3,3))
        await self.driver_queue.put((1,2,4))

So there is something in my test that causes it to grab the clock. Hmmm.

marlonjames commented 3 years ago

Every running task in the simulator thread (the main test coroutine and all forked tasks) must suspend execution by awaiting a Trigger before the cocotb scheduler returns to the simulator to continue. The Python-only triggers like waiting on Event, Lock or Queue methods may cause many suspends and wakeups of tasks before the scheduler returns to the simulator, as they can communicate as much as they like while the simulation is paused. Once every task is dependent on a simulator Trigger, such as Timer, RisingEdge/FallingEdge, etc, either by directly awaiting one or by waiting on an Event being set by a task that is waiting on a simulator Trigger, the scheduler returns to the simulator for further execution of the simulator until such time as the next callback occurs.

In pyuvm/examples/hang_bug/tinyalu_cocotb.py in your example repo, await pc.producer() runs the producer() coroutine inside the main test task, and since the Queue has maxsize = 1, it can't put() a new item after the iteration of the loop, so it happily suspends and waits for another task to call get() and make room in the Queue. Because no other user tasks are running that can call get(), cocotb returns to the simulator, which hangs or continues simulation forever.

raysalemi commented 3 years ago

Every running task in the simulator thread (the main test coroutine and all forked tasks) must suspend execution by awaiting a Trigger before the cocotb scheduler returns to the simulator to continue. The Python-only triggers like waiting on Event, Lock or Queue methods may cause many suspends and wakeups of tasks before the scheduler returns to the simulator, as they can communicate as much as they like while the simulation is paused. Once every task is dependent on a simulator Trigger, such as Timer, RisingEdge/FallingEdge, etc, either by directly awaiting one or by waiting on an Event being set by a task that is waiting on a simulator Trigger, the scheduler returns to the simulator for further execution of the simulator until such time as the next callback occurs.

In pyuvm/examples/hang_bug/tinyalu_cocotb.py in your example repo, await pc.producer() runs the producer() coroutine inside the main test task, and since the Queue has maxsize = 1, it can't put() a new item after the iteration of the loop, so it happily suspends and waits for another task to call get() and make room in the Queue. Because no other user tasks are running that can call get(), cocotb returns to the simulator, which hangs or continues simulation forever.

So the problem is that the software is blocking on a queue's put statement instead of a Trigger?

ktbarrett commented 3 years ago

since the Queue has maxsize = 1, it can't put() a new item after the iteration of the loop

Oof. Beware the single depth queue, there be demons (deadlock).

marlonjames commented 3 years ago

The problem is most likely one of two things:

  1. One or more tasks is running, or blocking on a function call, and hasn't awaited a cocotb Trigger. So things are stuck in Python land and not letting the simulator continue.
  2. All coroutines are waiting on Triggers that are never hit and the simulator continues to run forever in the background, which looks like a hang.
marlonjames commented 3 years ago

You could check that execution returns to the simulator by placing a break point here: https://github.com/cocotb/cocotb/blob/2b5b820c4ded3cb22852993471dcb99b1eadf326/cocotb/scheduler.py#L359 That should be the last Python code before it returns through the simulator module -> GPI -> simulator callback.

raysalemi commented 3 years ago

So the list of triggers are all the clock ones (Rising, Falling, Cycles, etc) and Timer?

The cocotb Event triggers do not return flow to the simulator, correct?

Ray

On Sun, Aug 8, 2021 at 9:24 PM Marlon James @.***> wrote:

The problem is most likely one of two things:

  1. One or more tasks is running, or blocking on a function call, and hasn't awaited a cocotb Trigger. So things are stuck in Python land and not letting the simulator continue.
  2. All coroutines are waiting on Triggers that are never hit and the simulator continues to run forever in the background, which looks like a hang.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cocotb/cocotb/issues/2653#issuecomment-894898325, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYFXWXMGLNGI6DA6WCHRM3T34U3ZANCNFSM5BXSPX3Q .

raysalemi commented 3 years ago

This solved the problem. My send_op routine used to look like this:

    async def send_op(self, aa, bb, op):
        self.driver_queue.put((aa, bb, op))

You can see the blocking put. I had no idea that the block would stop the whole simulator, I figured it would only stop this thread, a SystemVerilog paradigm.

The new one looks like this:

    async def send_op(self, aa, bb, op):
        while True:
            await FallingEdge(self.dut.clk)
            try:
                self.driver_queue.put_nowait((aa, bb, op))
                return
            except QueueFull:
                continue

Thank you all!

eric-wieser commented 3 years ago

What type is driver_queue in that example, @raysalemi? Is it a threading.Queue or a cocotb.queue?

Edit: I worked this out myself.

eric-wieser commented 3 years ago

You can see the blocking put. I had no idea that the block would stop the whole simulator,

Your problem is that your queue is a UVMQueue which extends queue.Queue (a threading primitive) instead of cocotb.Queue. Cocotb uses coroutines, not threads; so any kind of thread-based synchronization will cause the entire simulator to hang until the threads are unblocked.

Is there a reason you're not using cocotb.queue instead? This will only block the current coroutine.

eric-wieser commented 3 years ago

I would go as far as saying that any use of the threading module in your library is a design flaw that will bite you with a hang when you least expect it. I've successfully written threaded code for cocotb exactly once, and the only reason I did so was because cocotb.external and cocotb.function were broken at that point in time, and I needed similar functionality. They should be fixed now though.

raysalemi commented 3 years ago

You can see the blocking put. I had no idea that the block would stop the whole simulator,

Your problem is that your queue is a UVMQueue which extends queue.Queue (a threading primitive) instead of cocotb.Queue. Cocotb uses coroutines, not threads; so any kind of thread-based synchronization will cause the entire simulator to hang until the threads are unblocked.

Is there a reason you're not using cocotb.queue instead? This will only block the current coroutine.

The UVMQueue extends cocotb.queue.

class UVMQueue(cocotb.queue.Queue):
    """
    The UVM
...

This is why I was surprised at the hang.

raysalemi commented 3 years ago

I would go as far as saying that any use of the threading module in your library is a design flaw that will bite you with a hang when you least expect it. I've successfully written threaded code for cocotb exactly once, and the only reason I did so was because cocotb.external and cocotb.function were broken at that point in time, and I needed similar functionality. They should be fixed now though.

I've completely removed the threading module from pyuvm. It now depends upon cocotb.

Which again is why I was surprised that my put() hung the simulation.

eric-wieser commented 3 years ago

Oh, I was looking at the wrong branch, apologies. This code is wrong though:

This solved the problem. My send_op routine used to look like this:

    async def send_op(self, aa, bb, op):
        self.driver_queue.put((aa, bb, op))

it should look like

    async def send_op(self, aa, bb, op):
        await self.driver_queue.put((aa, bb, op))
raysalemi commented 3 years ago

But should it hang? That's the problem. The cocotb.queue is hanging the simulation. Is that supposed to happen?

raysalemi commented 3 years ago

My sim is hanging again. As you point out, I wasn't actually awaiting send_op

marlonjames commented 3 years ago

So the list of triggers are all the clock ones (Rising, Falling, Cycles, etc) and Timer? The cocotb Event triggers do not return flow to the simulator, correct?

No, await e.wait() will suspend that task. Same with await q.put(). When all tasks are suspended, simulation continues.

Each time the simulator gives control to cocotb is because something of interest happened in the simulation. awaiting a simulator Trigger is how cocotb tasks say "wake me up when this happens".

So ultimately only simulator triggers are the driving events for your cocotb test.

raysalemi commented 3 years ago

I've found the last hang. I was waiting on an cocotb.Event. I've solved it by creating an event that uses clock-based triggers:

class UVMEvent(Event):

    def __init__(self, name, clock):
        super().__init__(name=name)
        self.trigger = Timer(10, units="step")

    async def wait(self):
        while True:
            await self.trigger
            if self.is_set:
                return self.fired
            else:
                continue
eric-wieser commented 3 years ago

That is unlikely to be the right solution to your problem.

raysalemi commented 3 years ago

It’s all works now. I think it’s the only solution since my code must await clock triggers.

On Mon, Aug 9, 2021 at 1:22 PM Eric Wieser @.***> wrote:

That is unlikely to be the right solution to your problem.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cocotb/cocotb/issues/2653#issuecomment-895400945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYFXWWCG7EW6BMT3ZIHBR3T4AFE5ANCNFSM5BXSPX3Q .