The test test_custom_entry, which tests that custom GPI entry points work, fails when using Riviera-PRO: it hangs at shutdown, never going beyond the following messages printed on stdout:
# KERNEL: sample_module: mybit has been updated, new value is 1
# KERNEL: sample_module: mybits has been updated, new value is 11
# KERNEL: Simulation has finished. There are no more test vectors to simulate.
endsim
Analysis (symptoms)
GDB shows the hang to be in the threading module:
(gdb) py-bt
Traceback (most recent call first):
File "/usr/lib64/python3.11/threading.py", line 1583, in _shutdown
lock.acquire()
Reducing the test case shows that the test hangs as soon as the logging module is imported, which itself employs some locks.
Analysis (root cause)
Riviera-PRO (all versions that I tested, up to the latest 2023.10) calls to the cbEndOfSimulation from a different thread than the other VPI callbacks. The similar VHPI callback shows the same behavior in my testing.
gpi_entry_point(), a VPI entry point function (registered in vlog_startup_routine) is called from the "Main simulator" thread.
gpi_embed_end(), a function registered with VPI as end-of-simulation callback (cbEndOfSimulation) is called from a "riviera_simulat" thread.
VPI (and VHPI) are single-threaded APIs: all callbacks should come from the same simulator thread. That's something cocotb is relying on here, and that's why the test fails.
Impact
This issue only affects users with a custom GPI entry point, which does only a subset of work we're doing in cocotb's regular entry point. As soon as we use the "regular" GPI entry point with the scheduler and all of the "real" machinery, this problem disappears. We're likely ending up in another shutdown path in Riviera in these cases.
Steps to reproduce
# Use a version of cocotb with more debug information added.
git clone https://github.com/imphil/cocotb.git cocotb-repo-3467
cd cocotb-repo-3467
git checkout riviera-repro-3467
# Get a development environment and build cocotb.
nox -s dev -- /bin/bash -l
# Run the test from a clean state.
cd tests/test_cases/test_custom_entry
git clean -xdf .
COCOTB_SCHEDULER_DEBUG=1 SIM=riviera TOPLEVEL_LANG=verilog make
killall vsimsa # to end the hanging process
Notice test output like
# COUT: thread for entry_func: <_MainThread(MainThread, started 140364295960256)>
# COUT: thread for _sim_event: <_DummyThread(Dummy-1, started daemon 140364531877568)>
It is expected that in both cases MainThread is printed.
Use gdb to dig deeper:
$ COCOTB_ATTACH=30 COCOTB_SCHEDULER_DEBUG=1 SIM=riviera TOPLEVEL_LANG=verilog make
Waiting for 30 seconds - attach to PID 32529 with your debugger
$ # in another terminal
$ gdb -p 32529
Symptom
The test
test_custom_entry
, which tests that custom GPI entry points work, fails when using Riviera-PRO: it hangs at shutdown, never going beyond the following messages printed on stdout:Analysis (symptoms)
GDB shows the hang to be in the threading module:
Reducing the test case shows that the test hangs as soon as the
logging
module is imported, which itself employs some locks.Analysis (root cause)
Riviera-PRO (all versions that I tested, up to the latest 2023.10) calls to the
cbEndOfSimulation
from a different thread than the other VPI callbacks. The similar VHPI callback shows the same behavior in my testing.gpi_entry_point()
, a VPI entry point function (registered invlog_startup_routine
) is called from the "Main simulator" thread.gpi_embed_end()
, a function registered with VPI as end-of-simulation callback (cbEndOfSimulation
) is called from a "riviera_simulat" thread.VPI (and VHPI) are single-threaded APIs: all callbacks should come from the same simulator thread. That's something cocotb is relying on here, and that's why the test fails.
Impact
This issue only affects users with a custom GPI entry point, which does only a subset of work we're doing in cocotb's regular entry point. As soon as we use the "regular" GPI entry point with the scheduler and all of the "real" machinery, this problem disappears. We're likely ending up in another shutdown path in Riviera in these cases.
Steps to reproduce
Notice test output like
It is expected that in both cases
MainThread
is printed.Use gdb to dig deeper: