Closed klin02 closed 1 month ago
Tested on Single-core/Dual-core XiangShan. Single-core: Calls of Batch reduce by 40%, and simulation speed increases from 450KHz to 490KHz. Dual-core: Firstly support Batch feature. Simulation speed increases from 140KHz to 200KHz.
After Batch enabled, dual-core Xiangshan can use only single data package(includes DiffState and step trigger) to communicate software, which make it ease to migrate between different platform(such as Palladium, FPGA)
Previously, we view data collected from same cycle as a whole, end batch assembling when step data longer than available space. It results in bubble in transmission, and cannot handle situation when step data longer than Max width in a single transmission.
This change support spliting step data according to collector, appending part of data to output and updating remained to state. To shorten logic length, we divide complex logic to three stage.
Note step data may be splited to different batch func, and should be read as a whole, so we avoid buffer-zone switch when batch enabled.