Closed raysalemi closed 3 years ago
I tried forking uvm_root().run_test("AluTest")
instead of awaiting it. But got the same hang.
Are you sure it's hanging and it's really crashing or finishing without printing an error message? If you fork
that last statement, the test should end immediately. Does run_test
use threads, is it deadlocking? I would use a remote Python debugger like pdb_attach
.
Are you sure it's hanging and it's really crashing or finishing without printing an error message? If you
fork
that last statement, the test should end immediately. Doesrun_test
use threads, is it deadlocking? I would use a remote Python debugger likepdb_attach
.
All my code runs and returns, but when I try to wait for the end of sim by counting clocks I hang. My $monitor in Verilog shows that the clock has stopped, or at least it stops printing to the screen. I need to suspend the job and kill it.
Is there any way cocotb is stopping an HDL-generated clock?
async def run_phase(self):
self.raise_objection()
seqr = ConfigDB().get(self, "", "SEQR")
dut = ConfigDB().get(self,"","DUT")
seq = AluSeq("seq")
await seq.start(seqr)
print("Counting Clock Cycles")
await ClockCycles(dut.clk, 10)
print("ClockCycles done")
self.drop_objection()
Here's the output. The last thing printed is Counting Clock Cycles
:
reset_n: z clk: 0 A: zz B:zz, op: z start: z, done x
reset_n: z clk: 1 A: zz B:zz, op: z start: z, done x
reset_n: 0 clk: 0 A: 0 B:0, op: 0 start: z, done x
reset_n: 0 clk: 1 A: 0 B:0, op: 0 start: z, done 0
reset_n: 1 clk: 0 A: 0 B:0, op: 0 start: z, done 0
reset_n: 1 clk: 1 A: 0 B:0, op: 0 start: z, done 0
AWAITING FALLING EDGE 0
finish_item -> cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
Sending command: cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
PUTTING: aa:12 bb:b7, op:1
HAVE PUT THE COMMAND
DEBUG: tinyalu_uvm.py(47)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
PAST ITEM DONE
finish_item done -> cmd_tr : A: 0x12 OP: ADD (1) B: 0xb7
Counting Clock Cycles
^Z
zsh: suspended make
(base) raysalemi@RayProMac TinyALU % kill -9 %1 %2 %3 %4 %5
Do you have a complete example somewhere? It's hard to debug this type of thing when your examples are full of calls to functions not in your snippet.
Do you have a complete example somewhere? It's hard to debug this type of thing when your examples are full of calls to functions not in your snippet.
Yes. Thanks for taking a look!
It is here: https://github.com/pyuvm/pyuvm/tree/broken
It' s in examples/TinyALU
and after you set PYTHONPATH
to the pyuvm
directory make
should simply demonstrate the hang.
I'm running with Icarus on the Mac.
This code seems to demonstrate that the clock only runs if there is a cocotb Trigger monitoring it
@cocotb.test()
async def stuck(dut):
"""does nothing but sleep"""
await FallingEdge(dut.clk)
dut.reset_n <= 0
dut.A <= 0
dut.B <= 0
dut.op <= 0
await FallingEdge(dut.clk)
dut.reset_n <= 1
await FallingEdge(dut.clk)
print("clock fall 1")
await FallingEdge(dut.clk)
print ("clock fall 2")
await ClockCycles(dut.clk, 10)
cocotb.fork(asleep())
print("AT END")
The clock stops while the asleep()
coroutine is running even if it is forked.
async def asleep(sec=5):
time.sleep(sec)
I think that the simulation stops unless there is a cocotb.trigger waiting on the clock. In my case I am blocked awaiting a queue.put() and expecting the clock to run and empty the queue so I can proceed.
Is this expected behavior?
I'll try to create a simple test case.
You should not use time.sleep
in a cocotb coroutine; it pauses all coroutines until the sleep finishes. Additionally, it rarely makes sense to sleep for wall time in a simulation; surely you want to sleep for simulation time?
I've created a test case with a producer and consumer and I think I've found my problem:
This version of the test case hangs:
@cocotb.test()
async def stuck(dut):
""" test clock hang"""
pc = ProdCon(dut)
await pc.producer()
await pc.consumer()
This version works:
@cocotb.test()
async def stuck(dut):
""" test clock hang"""
pc = ProdCon(dut)
cocotb.fork(pc.producer())
cocotb.fork(pc.consumer())
await cocotb.triggers.ClockCycles(dut.clk, 10)
I think I imagined the await statement give way to the scheduler when it blocked. But that is clearly not what happens, and when you think about it it doesn't happen that way with SystemVerilog tasks either.
I suspect the problem is that I need to forking something that I am awaiting.
That's the rub actually. I have everything forked and still the consuming thread is not consuming in my larger testbench.
I finally discovered a bug deep in the pyuvm where I had called the cocotb queue with the block=False
argument instead of using no_wait
.
async
/await
is co-operative scheduling, meaning: if a thread blocks a coroutine from reaching an await
statement, cocotb's scheduler cannot switch tasks to allow the clock task to run. There are very few reasons for calls to threaded API in cocotb. time.sleep
is very wrong here and has no place in cocotb tests.
async
/await
is co-operative scheduling, meaning: if a thread blocks a coroutine from reaching anawait
statement, cocotb's scheduler cannot switch tasks to allow the clock task to run. There are very few reasons for calls to threaded API in cocotb.time.sleep
is very wrong here and has no place in cocotb tests.
It was actually just an example. I don't have it in my actual code. Which, sadly is still hanging. I don't understand why because all the coroutines are forked. Yet still it hangs on a queue.put.
The uvm_root().run_test
fork is relying upon the driver_bfm
fork to empty the queue so it can unblock. But the simulator seems to freeze up in the run_test
fork.
cocotb.fork(proxy.driver_bfm())
cocotb.fork(proxy.cmd_mon_bfm())
cocotb.fork(proxy.result_mon_bfm())
cocotb.fork(uvm_root().run_test("AluTest"))
await proxy.done.wait()
Using a remote debugger should help you trace the execution of things. While it is hanging you can break into the debugger and see what is running. You can see this answer in a related discussion topic for some options, including pdb_attach
as mentioned earlier.
Yes. I'm seeing that the driver_bfm is never seeing the negative clock edge it needs. If I comment out the run_test
the clock works normally.
How would a forked coroutine block the simulator?
I've got it boiled down to this. The clock goes high, and then it waits for my forked coroutine to finish running before it goes low again. That means if my code waits for the clock it hangs, but I should be able to wait for the clock, right?
Why would the simulator treat my coroutine as something that has to return before it will continue going?
CLOCK TOGGLED 1
CLOCK TOGGLED 0
CLOCK TOGGLED 1
CLOCK TOGGLED 0
CLOCK TOGGLED 1
WAITING FOR FALLING EDGE
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0x9c OP: ADD (1) B: 0x1a
FINISHED ITEM
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0xf8 OP: AND (2) B: 0xee
FINISHED ITEM
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0x16 OP: XOR (3) B: 0x82
FINISHED ITEM
STARTING ITEM
STARTED ITEM
CALLING FINISH_ITEM
DEBUG: tinyalu_uvm.py(48)[uvm.uvm_test_top.env.driver]: Sent command: cmd_tr : A: 0xfd OP: MUL (4) B: 0x45
FINISHED ITEM
CLOCK TOGGLED 0
If I replace my test with this then it works:
async def put_stuff(self):
await self.driver_queue.put((1,1,1))
await self.driver_queue.put((2,2,3))
await self.driver_queue.put((3,3,3))
await self.driver_queue.put((1,2,4))
So there is something in my test that causes it to grab the clock. Hmmm.
Every running task in the simulator thread (the main test coroutine and all forked tasks) must suspend execution by await
ing a Trigger
before the cocotb scheduler returns to the simulator to continue.
The Python-only triggers like waiting on Event
, Lock
or Queue
methods may cause many suspends and wakeups of tasks before the scheduler returns to the simulator, as they can communicate as much as they like while the simulation is paused.
Once every task is dependent on a simulator Trigger
, such as Timer
, RisingEdge
/FallingEdge
, etc, either by directly await
ing one or by waiting on an Event
being set by a task that is waiting on a simulator Trigger
, the scheduler returns to the simulator for further execution of the simulator until such time as the next callback occurs.
In pyuvm/examples/hang_bug/tinyalu_cocotb.py
in your example repo, await pc.producer()
runs the producer()
coroutine inside the main test task, and since the Queue
has maxsize = 1
, it can't put()
a new item after the iteration of the loop, so it happily suspends and waits for another task to call get()
and make room in the Queue
. Because no other user tasks are running that can call get()
, cocotb returns to the simulator, which hangs or continues simulation forever.
Every running task in the simulator thread (the main test coroutine and all forked tasks) must suspend execution by
await
ing aTrigger
before the cocotb scheduler returns to the simulator to continue. The Python-only triggers like waiting onEvent
,Lock
orQueue
methods may cause many suspends and wakeups of tasks before the scheduler returns to the simulator, as they can communicate as much as they like while the simulation is paused. Once every task is dependent on a simulatorTrigger
, such asTimer
,RisingEdge
/FallingEdge
, etc, either by directlyawait
ing one or by waiting on anEvent
being set by a task that is waiting on a simulatorTrigger
, the scheduler returns to the simulator for further execution of the simulator until such time as the next callback occurs.In
pyuvm/examples/hang_bug/tinyalu_cocotb.py
in your example repo,await pc.producer()
runs theproducer()
coroutine inside the main test task, and since theQueue
hasmaxsize = 1
, it can'tput()
a new item after the iteration of the loop, so it happily suspends and waits for another task to callget()
and make room in theQueue
. Because no other user tasks are running that can callget()
, cocotb returns to the simulator, which hangs or continues simulation forever.
So the problem is that the software is blocking on a queue's put statement instead of a Trigger?
since the Queue has maxsize = 1, it can't put() a new item after the iteration of the loop
Oof. Beware the single depth queue, there be demons (deadlock).
The problem is most likely one of two things:
You could check that execution returns to the simulator by placing a break point here: https://github.com/cocotb/cocotb/blob/2b5b820c4ded3cb22852993471dcb99b1eadf326/cocotb/scheduler.py#L359 That should be the last Python code before it returns through the simulator module -> GPI -> simulator callback.
So the list of triggers are all the clock ones (Rising, Falling, Cycles, etc) and Timer?
The cocotb Event triggers do not return flow to the simulator, correct?
Ray
On Sun, Aug 8, 2021 at 9:24 PM Marlon James @.***> wrote:
The problem is most likely one of two things:
- One or more tasks is running, or blocking on a function call, and hasn't awaited a cocotb Trigger. So things are stuck in Python land and not letting the simulator continue.
- All coroutines are waiting on Triggers that are never hit and the simulator continues to run forever in the background, which looks like a hang.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cocotb/cocotb/issues/2653#issuecomment-894898325, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYFXWXMGLNGI6DA6WCHRM3T34U3ZANCNFSM5BXSPX3Q .
This solved the problem. My send_op
routine used to look like this:
async def send_op(self, aa, bb, op):
self.driver_queue.put((aa, bb, op))
You can see the blocking put. I had no idea that the block would stop the whole simulator, I figured it would only stop this thread, a SystemVerilog paradigm.
The new one looks like this:
async def send_op(self, aa, bb, op):
while True:
await FallingEdge(self.dut.clk)
try:
self.driver_queue.put_nowait((aa, bb, op))
return
except QueueFull:
continue
Thank you all!
What type is driver_queue
in that example, @raysalemi? Is it a threading.Queue
or a cocotb.queue
?
Edit: I worked this out myself.
You can see the blocking put. I had no idea that the block would stop the whole simulator,
Your problem is that your queue is a UVMQueue
which extends queue.Queue
(a threading primitive) instead of cocotb.Queue
. Cocotb uses coroutines, not threads; so any kind of thread-based synchronization will cause the entire simulator to hang until the threads are unblocked.
Is there a reason you're not using cocotb.queue
instead? This will only block the current coroutine.
I would go as far as saying that any use of the threading
module in your library is a design flaw that will bite you with a hang when you least expect it. I've successfully written threaded code for cocotb exactly once, and the only reason I did so was because cocotb.external
and cocotb.function
were broken at that point in time, and I needed similar functionality. They should be fixed now though.
You can see the blocking put. I had no idea that the block would stop the whole simulator,
Your problem is that your queue is a
UVMQueue
which extendsqueue.Queue
(a threading primitive) instead ofcocotb.Queue
. Cocotb uses coroutines, not threads; so any kind of thread-based synchronization will cause the entire simulator to hang until the threads are unblocked.Is there a reason you're not using
cocotb.queue
instead? This will only block the current coroutine.
The UVMQueue extends cocotb.queue.
class UVMQueue(cocotb.queue.Queue):
"""
The UVM
...
This is why I was surprised at the hang.
I would go as far as saying that any use of the
threading
module in your library is a design flaw that will bite you with a hang when you least expect it. I've successfully written threaded code for cocotb exactly once, and the only reason I did so was becausecocotb.external
andcocotb.function
were broken at that point in time, and I needed similar functionality. They should be fixed now though.
I've completely removed the threading
module from pyuvm. It now depends upon cocotb.
Which again is why I was surprised that my put()
hung the simulation.
Oh, I was looking at the wrong branch, apologies. This code is wrong though:
This solved the problem. My
send_op
routine used to look like this:async def send_op(self, aa, bb, op): self.driver_queue.put((aa, bb, op))
it should look like
async def send_op(self, aa, bb, op):
await self.driver_queue.put((aa, bb, op))
But should it hang? That's the problem. The cocotb.queue is hanging the simulation. Is that supposed to happen?
My sim is hanging again. As you point out, I wasn't actually awaiting send_op
So the list of triggers are all the clock ones (Rising, Falling, Cycles, etc) and Timer? The cocotb Event triggers do not return flow to the simulator, correct?
No, await e.wait()
will suspend that task. Same with await q.put()
. When all tasks are suspended, simulation continues.
Each time the simulator gives control to cocotb is because something of interest happened in the simulation. await
ing a simulator Trigger
is how cocotb tasks say "wake me up when this happens".
So ultimately only simulator triggers are the driving events for your cocotb test.
I've found the last hang. I was waiting on an cocotb.Event. I've solved it by creating an event that uses clock-based triggers:
class UVMEvent(Event):
def __init__(self, name, clock):
super().__init__(name=name)
self.trigger = Timer(10, units="step")
async def wait(self):
while True:
await self.trigger
if self.is_set:
return self.fired
else:
continue
That is unlikely to be the right solution to your problem.
It’s all works now. I think it’s the only solution since my code must await clock triggers.
On Mon, Aug 9, 2021 at 1:22 PM Eric Wieser @.***> wrote:
That is unlikely to be the right solution to your problem.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/cocotb/cocotb/issues/2653#issuecomment-895400945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYFXWWCG7EW6BMT3ZIHBR3T4AFE5ANCNFSM5BXSPX3Q .
I'm using the cocotb Clock class to drive the clock in my design. Which I fork in this test:
However, there is a point in my simulation where I await putting something in a queue and the clock stops. This happens inside the
run_test
coroutine at the end of the test.Why would an await in the
run_test
stop the clock from running? This is the code that stops the sim:And here is where the queue should be getting emptied but since the clock has stopped I'm not doing the read.
Any thoughts on what I'm doing wrong?