Use remote snitch icache

Konste11ation commented 2 months ago

This PR updates the file for the synthesis flow

Using the gitrepo snitch_icache We are doing the synthesis at the host level, and the snitch_icache files are defined as a remote git repo in the host. However, the current snax cluster marked the snitch_icache as local files. This will cause the synthesis script to include the snitch_icache modules twice and cause a problem. To solve this issue, the snitch_icache is also marked as git repo in snax cluster. The parameters are a little bit different but the IO's are the same. The old snitch_icache


module snitch_icache #(
/// Number of request (fetch) ports
parameter int NR_FETCH_PORTS = -1,
/// L0 Cache Line Count
parameter int L0_LINE_COUNT = -1,
/// Cache Line Width
parameter int LINE_WIDTH = -1,
/// The number of cache lines per set. Power of two; >= 2.
parameter int LINE_COUNT = -1,
/// The set associativity of the cache. Power of two; >= 1.
parameter int SET_COUNT = 1,
/// Fetch interface address width. Same as FILL_AW; >= 1.
parameter int FETCH_AW = -1,
/// Fetch interface data width. Power of two; >= 8.
parameter int FETCH_DW = -1,
/// Fill interface address width. Same as FETCH_AW; >= 1.
parameter int FILL_AW = -1,
/// Fill interface data width. Power of two; >= 8.
parameter int FILL_DW = -1,
/// This reduces area impact at the cost of
/// increased hassle of having latches in
/// the design.
/// i_snitch_icache/gen_prefetcher*i_snitch_icache_l0/data*/Q
parameter bit EARLY_LATCH = 0,
/// Tag width of the data determining logic, this can reduce the
/// the critical path into the L0 cache when small. The trade-off
/// is a higher miss-rate in case the smaller tag matches more
/// tags. The tag must be smaller than the necessary L0 tag.
/// If configured to `-1` the entire tag is used, effectively
/// disabling this feature.
parameter int L0_EARLY_TAG_WIDTH = -1,
/// Operate L0 cache in slower clock-domain
parameter bit ISO_CROSSING      = 1,
/// Configuration input types for memory cuts used in implementation.
parameter type sram_cfg_data_t  = logic,
parameter type sram_cfg_tag_t   = logic,

parameter type axi_req_t = logic,
parameter type axi_rsp_t = logic
) (
input  logic clk_i,
input  logic clk_d2_i,
input  logic rst_ni,

input  logic                               enable_prefetching_i,
output snitch_icache_pkg::icache_events_t [NR_FETCH_PORTS-1:0] icache_events_o,

input  logic [NR_FETCH_PORTS-1:0]               flush_valid_i,
output logic [NR_FETCH_PORTS-1:0]               flush_ready_o,

input  logic [NR_FETCH_PORTS-1:0][FETCH_AW-1:0] inst_addr_i,
output logic [NR_FETCH_PORTS-1:0][FETCH_DW-1:0] inst_data_o,
input  logic [NR_FETCH_PORTS-1:0]               inst_cacheable_i,
input  logic [NR_FETCH_PORTS-1:0]               inst_valid_i,
output logic [NR_FETCH_PORTS-1:0]               inst_ready_o,
output logic [NR_FETCH_PORTS-1:0]               inst_error_o,

input  sram_cfg_data_t  sram_cfg_data_i,
input  sram_cfg_tag_t   sram_cfg_tag_i,

output axi_req_t axi_req_o,
input  axi_rsp_t axi_rsp_i
);

The snitch_icache from git

module snitch_icache import snitch_icache_pkg::; #( /// Number of request (fetch) ports parameter int unsigned NR_FETCH_PORTS = -1, /// L0 Cache Line Count (L0 is fully associative) parameter int unsigned L0_LINE_COUNT = -1, /// Cache Line Width parameter int unsigned LINE_WIDTH = -1, /// The number of cache lines per set. Power of two; >= 2. parameter int unsigned LINE_COUNT = -1, /// The set associativity of the cache. Power of two; >= 1. parameter int unsigned WAY_COUNT = 1, /// Fetch interface address width. Same as FILL_AW; >= 1. parameter int unsigned FETCH_AW = -1, /// Fetch interface data width. Power of two; >= 8. parameter int unsigned FETCH_DW = -1, /// Fill interface address width. Same as FETCH_AW; >= 1. parameter int unsigned FILL_AW = -1, /// Fill interface data width. Power of two; >= 8. parameter int unsigned FILL_DW = -1, /// Allow fetches to have priority over prefetches for L0 to L1 parameter bit FETCH_PRIORITY = 1'b0, /// Merge L0-L1 fetches if requesting the same address parameter bit MERGE_FETCHES = 1'b0, /// Serialize the L1 lookup (parallel tag/data lookup by default) parameter bit SERIAL_LOOKUP = 0, /// Replace the L1 tag banks with latch-based SCM. parameter bit L1_TAG_SCM = 0, /// Number of pending response beats for the L1 cache. parameter int unsigned NUM_AXI_OUTSTANDING = 2, /// This reduces area impact at the cost of /// increased hassle of having latches in /// the design. /// i_snitch_icache/gen_prefetcheri_snitch_icache_l0/data*/Q parameter bit EARLY_LATCH = 0, /// Tag width of the data determining logic, this can reduce the /// the critical path into the L0 cache when small. The trade-off /// is a higher miss-rate in case the smaller tag matches more /// tags. The tag must be smaller than the necessary L0 tag. /// If configured to -1 the entire tag is used, effectively /// disabling this feature. parameter int L0_EARLY_TAG_WIDTH = -1, /// Operate L0 cache in slower clock-domain parameter bit ISO_CROSSING = 1, /// Configuration input types for memory cuts used in implementation. parameter type sram_cfg_data_t = logic, parameter type sram_cfg_tag_t = logic,

parameter type axi_req_t = logic, parameter type axi_rsp_t = logic ) ( input logic clk_i, input logic clk_d2_i, input logic rst_ni,

input logic enable_prefetching_i, output icache_l0_events_t [NR_FETCH_PORTS-1:0] icache_l0_events_o, output icache_l1_events_t icache_l1_events_o,

input logic [NR_FETCH_PORTS-1:0] flush_valid_i, output logic [NR_FETCH_PORTS-1:0] flush_ready_o,

input logic [NR_FETCH_PORTS-1:0][FETCH_AW-1:0] inst_addr_i, output logic [NR_FETCH_PORTS-1:0][FETCH_DW-1:0] inst_data_o, input logic [NR_FETCH_PORTS-1:0] inst_cacheable_i, input logic [NR_FETCH_PORTS-1:0] inst_valid_i, output logic [NR_FETCH_PORTS-1:0] inst_ready_o, output logic [NR_FETCH_PORTS-1:0] inst_error_o,

input sram_cfg_data_t sram_cfg_data_i, input sram_cfg_tag_t sram_cfg_tag_i,

output axi_req_t axi_req_o, input axi_rsp_t axi_rsp_i );

The difference is to change the SET_COUNT to WAY_COUNT
old

/// The set associativity of the cache. Power of two; >= 1. parameter int SET_COUNT = 1,

new

/// The set associativity of the cache. Power of two; >= 1. parameter int unsigned WAY_COUNT = 1,

This change can be seem from the https://github.com/KULeuven-MICAS/cluster_icache/blob/main/Changelog.md that "Rename SET_COUNT to WAY_COUNT to correct terminology, as it reflects the number of ways in a set."

And the following new parameters

/// Allow fetches to have priority over prefetches for L0 to L1 parameter bit FETCH_PRIORITY = 1'b0, /// Merge L0-L1 fetches if requesting the same address parameter bit MERGE_FETCHES = 1'b0, /// Serialize the L1 lookup (parallel tag/data lookup by default) parameter bit SERIAL_LOOKUP = 0, /// Replace the L1 tag banks with latch-based SCM. parameter bit L1_TAG_SCM = 0, /// Number of pending response beats for the L1 cache. parameter int unsigned NUM_AXI_OUTSTANDING = 2,

These parameters can be safely used by default.

Another change is the slight difference in the snitch_icache_pkg.sv
the old one has

typedef struct packed {
  logic l0_miss;
  logic l0_hit;
  logic l0_prefetch;
  logic l0_double_hit;
  logic l0_stall;
} icache_events_t;

the new one has

typedef struct packed { logic l0_miss; logic l0_hit; logic l0_prefetch; logic l0_double_hit; logic l0_stall; } icache_l0_events_t;


There are a few things to update at the snax
1.1 Update the bender file to use the remote gitrepo
1.2 Update the the SET_COUNT to WAY_COUNT in snitch_hive.sv
1.3 Update the icache_events_t to icache_l0_events_t

2. Update the bender target for synthesis of the cluster only
I add the synthesis bender target of "tc_sram_cluster_only" to distinguish the synthesis of host + cluster or only the cluster.

rgantonio commented 2 months ago

So after discussing with @Konste11ation , he'll fork the repo first into KU Leuven group. Add necessary changes, then should be okay!

rgantonio commented 2 months ago

@Konste11ation please ping us again before proceeding so we make sure things are still working as intended.

rgantonio commented 2 months ago

@Konste11ation test it when you pushed

https://github.com/KULeuven-MICAS/cluster_icache/pull/1

First 😄

KULeuven-MICAS / snax_cluster

Use remote snitch icache #339