Severson-Group / AMDC-Firmware

Embedded system code (C and Verilog) which runs the AMDC Hardware
http://docs.amdc.dev/firmware
BSD 3-Clause "New" or "Revised" License
30 stars 5 forks source link

Turn on Cortex-A9 program flow prediction #211

Open npetersen2 opened 2 years ago

npetersen2 commented 2 years ago

Per the Zynq 7000 TRM:

On page 67, it talks about Branch Prediction on the ARM Cortex-A9 core. The core has very nice hardware to support good branch prediction (all built in!), but it looks to be disabled by default:

Users can enable program flow prediction by setting the Z bit in the CP15 c1 Control register to 1. Before switching the program flow prediction on, a BTAC flush operation must be performed which has the additional effect of setting the GHB into a known state.

After poking around the ARM TRM for the Cortex-A9:

On page 75, it explains that setting the bit 11 in the System Control Register is what we want:

Enables program flow prediction: 0: Program flow prediction disabled. This is the reset value. 1: Program flow prediction enabled.

So, it looks like, indeed, hardware branch prediction is off by default... Not sure if there are any issues with just turning it on. It should help run-time performance.

We should derive some sort of work-load where we time its run time before and after enabling program flow prediction. Would be cool to see some actual improvement!

npetersen2 commented 2 years ago

Another note on a similar subject:

From page 69 of the Xilinx Zynq 700 TRM:

The Cortex-A9 load/store unit supports speculative data pre-fetching which monitors sequential accesses made by program and starts fetching the next expected line before it has been requested. This feature is enabled in the cp15 Auxiliary Control register (DP bit). The pre-fetched lines can be dropped before allocation, and PLD instruction has higher priority.