Open carlosedp opened 3 months ago
Maybe after #4158 lands, we can find some other high performance solution w/ DPI.
I did few experiments on my local workstation (ubuntu 22.04, 5950X).
time | |
---|---|
chiselsim (baseline) | 16 sec |
chiseltest (default (=treadle?)) | 2.5 sec |
chiseltest (verilataor) | 12 sec |
chiselsim (with removing Files.createTempDirectory, commit) | 6.5 sec |
chiselsim (with fusing all tests into a single test, commit) | 2.5 sec |
It took 16 sec to run ./mill chiselv.test.testOnly chiselv.ALUSpec
which is 6~7 slower than chiseltest(treadle) in my environment. When I removed createTempDirectory
in EphemeralSimulator it downs to 6.5 sec. Even though my enviroment is linux I wouldn't be surprised if mac also has similar file IO issue.
One issue here is verilog is generated and compiled every time (in this case 13 times) so I feel 6.5 sec seems to be a reasonable time for the overhead. When I fused all tests into a single test it took 2.5 sec to run. For comparison I checked chiseltest time with verilator backend but it took 12 sec (which uses SFC so not apple to apple comparison though).
So actionable items for us would be:
createTempDirectory
is actually causing the regression and fixI have also noticed that a single-thread chiselsim
testbench runs ~4 times slower than a similar testbench using chiseltest
, with or without multithreaded tasks. This is the test execution alone, not including the compilation times. A Python cocotb testbench with the same functionality and test iterations but with concurrent drivers/monitors and additional checks also runs 4-5 times faster than a more primitive chiselsim version.
At least on my system (macOS, arm64, fast NVM drive), I realized that the the file I/O for generating the execution script is partly to be blamed. Currently, the execution script is always enabled and the simulator continues to fill up the script file with unuseful messages even when an executionScriptLimit
of say 0
is specified. Disabling the script gives me about 1.5x-2x speed improvement. Still not nearly as good as cocotb
or chiseltest
.
Another thing to investigate is `svsim's choice of a text-based protocol and use of stdio for the communications between Scala and the simulation executable. Depending on the amount of data communication, the data overhead and conversion overhead could be significant. I won't be surprised if the impact would vary on different platforms and operating systems.
PS: this WIP PR includes adding an executionScriptEnabled
flag to disable the execution script, which can easily be made into a quick standalone PR.
Thanks for looking into this everyone and for all of your efforts to improve it!
SVSim as the engine underneath ChiselSim definitely has some overhead, some of which we should fix, some of which we can mitigate.
Excellent observation @kammoh on the execution script, we should probably disable that by default. I'll nit at the characterization of it as "unuseful messages" since they are intended for simulation replay which is a pretty neat debugging feature: https://github.com/chipsalliance/chisel/tree/main/svsim#make-replay.
Making the protocol more efficient might help some. One of the design decisions of SVSim is to use inter-process communication (see README) which will have an overhead if there is a lot of communication, especially every cycle. SVSim is optimized for having some amount of decoupling and the raw API supports essentially clocking the design until some simple "port equals value" condition is met. This does not lend itself well to peek-poke style testing (thus the measured slowdowns) but works well if there is some decoupling. At SiFive we decouple it quite a bit (doing some testing logic in Chisel itself) and we get no slowdown, but this is not necessarily the best API for ChiselSim.
If we want highly coupled peek/poke-style tests to work well, we probably just need to avoid IPC. That may require an alternative backend to SVSim. We need an alternative backend eventually to support Arcillator (CIRCT's native simulator), which may itself be another source of speedups. Another possibility could be making it convenient to express "agents" that are pure Chisel (so they can be compiled into the simulation) but seamlessly interoperate with ChiselSim to give some decoupling.
Thus throwing some ideas out there, thanks everyone for looking into this!
Converting some tests to ChiselSim, I've noticed the tests run about 20x slower than chiseltest.
As a comparison, running on ChiselSim:
and on chiseltest:
I've followed the migration guide which ended with the following simple changes:
The file is from https://github.com/carlosedp/chiselv/blob/main/chiselv/test/src/ALUSpec.scala
Type of issue: Bug Report
Please tell us about your environment:
Chisel 6.4.0 on MacOS Sonoma 14.5. Verilator 5.024 2024-04-05 rev UNKNOWN.REV