cocotb / cocotb

cocotb, a coroutine based cosimulation library for writing VHDL and Verilog testbenches in Python
https://www.cocotb.org
BSD 3-Clause "New" or "Revised" License
1.8k stars 511 forks source link

Riviera-PRO: cbEndOfSimulation callbacks don't (always) come from main thread #3467

Closed imphil closed 4 months ago

imphil commented 1 year ago

Symptom

The test test_custom_entry, which tests that custom GPI entry points work, fails when using Riviera-PRO: it hangs at shutdown, never going beyond the following messages printed on stdout:

# KERNEL: sample_module: mybit has been updated, new value is 1
# KERNEL: sample_module: mybits has been updated, new value is 11
# KERNEL: Simulation has finished. There are no more test vectors to simulate.
endsim

Analysis (symptoms)

GDB shows the hang to be in the threading module:

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib64/python3.11/threading.py", line 1583, in _shutdown
    lock.acquire()

Reducing the test case shows that the test hangs as soon as the logging module is imported, which itself employs some locks.

Analysis (root cause)

Riviera-PRO (all versions that I tested, up to the latest 2023.10) calls to the cbEndOfSimulation from a different thread than the other VPI callbacks. The similar VHPI callback shows the same behavior in my testing.

VPI (and VHPI) are single-threaded APIs: all callbacks should come from the same simulator thread. That's something cocotb is relying on here, and that's why the test fails.

Impact

This issue only affects users with a custom GPI entry point, which does only a subset of work we're doing in cocotb's regular entry point. As soon as we use the "regular" GPI entry point with the scheduler and all of the "real" machinery, this problem disappears. We're likely ending up in another shutdown path in Riviera in these cases.

Steps to reproduce

# Use a version of cocotb with more debug information added.
git clone https://github.com/imphil/cocotb.git cocotb-repo-3467
cd cocotb-repo-3467
git checkout riviera-repro-3467

# Get a development environment and build cocotb.
nox -s dev -- /bin/bash -l

# Run the test from a clean state.
cd tests/test_cases/test_custom_entry
git clean -xdf .
COCOTB_SCHEDULER_DEBUG=1 SIM=riviera TOPLEVEL_LANG=verilog make

killall vsimsa # to end the hanging process

Notice test output like

# COUT: thread for entry_func: <_MainThread(MainThread, started 140364295960256)>
# COUT: thread for _sim_event: <_DummyThread(Dummy-1, started daemon 140364531877568)>

It is expected that in both cases MainThread is printed.

Use gdb to dig deeper:

$ COCOTB_ATTACH=30 COCOTB_SCHEDULER_DEBUG=1 SIM=riviera TOPLEVEL_LANG=verilog make
Waiting for 30 seconds - attach to PID 32529 with your debugger
$ # in another terminal
$ gdb -p 32529
marlonjames commented 1 year ago

This looks related to https://github.com/cocotb/cocotb/issues/1859

imphil commented 1 year ago

Filed SPT82314 with Aldec.

imphil commented 4 months ago

Fixed in Riviera-PRO 2024.04 and confirmed on my machine. Closing.