Closed Rodrigodd closed 2 months ago
I'm running PR, everything is fine there, thanks for the detailed descriptions you do, future generations will be grateful :) I'll add from myself - I'm pleased that the results go to the emulator, since you probably noticed that this is one of the topics I'm interested in. I also started to make SM83 simulator in C, small part of what I managed to do (decoder) is already in repository. But first I want to add all accumulated materials on SoC.
For the past couple of weeks, I have been developing a way to speed up the simulation. What I have come up with is using Yosys to automatically rewrite the Verilog code, optimizing out the usage of tri-states (and making other small changes) and generating code that can be simulated using Verilator.
I use Yosys to:
z
as one of their inputs).BusKeeper
(which encompasses allBusKeeper
s) with a$dlatch
(Yosys internal cell).dff
insideSequencer
to ensure Verilator simulates the signals in the correct order.I had to implement some features in Yosys to achieve what I needed, and they are currently in this branch. The changes mentioned above are:
tribuf -propagate
.$delay
cell, which translates to a delay in Verilog.opt_merge_wires
pass, though in the end it was not needed, it’s now just a small optimization. I may remove it if I am not able to upstream it.In the next couple of days, I will try to clean up and upstream those changes. For now, you need to build Yosys from my fork to use it.
With these changes, I can now run the simulation about 35 times faster than before. Note that I also fixed the clock frequency used in the simulation—it was set to 20 MHz before, which is 4.8 times faster than the original Game Boy, so the actual speedup is 168 times faster than my previous measurement, from around 8 hours per simulated second to currently 2.35 minutes per simulated second.
This means that I can simulate the full
cpu_instrs.mem
test in around 2h30min, which is a much more manageable timeframe. However, I still haven't run it. I plan to make a tool for comparing wave files, and then compare the simulated run with wave files generated by my emulator (which I also need to finish implementing).Below are my benchmark results, simulating 4917 µs. I also tried to compile the Verilator simulation with Profile-Guided Optimization (PGO), but it didn’t significantly speed up the simulation.