KULeuven-MICAS / snax_cluster

A heterogeneous accelerator-centric compute cluster
Apache License 2.0
11 stars 10 forks source link

Use remote snitch icache #339

Closed Konste11ation closed 2 months ago

Konste11ation commented 2 months ago

This PR updates the file for the synthesis flow

  1. Using the gitrepo snitch_icache We are doing the synthesis at the host level, and the snitch_icache files are defined as a remote git repo in the host. However, the current snax cluster marked the snitch_icache as local files. This will cause the synthesis script to include the snitch_icache modules twice and cause a problem. To solve this issue, the snitch_icache is also marked as git repo in snax cluster. The parameters are a little bit different but the IO's are the same. The old snitch_icache

    
    module snitch_icache #(
    /// Number of request (fetch) ports
    parameter int NR_FETCH_PORTS = -1,
    /// L0 Cache Line Count
    parameter int L0_LINE_COUNT = -1,
    /// Cache Line Width
    parameter int LINE_WIDTH = -1,
    /// The number of cache lines per set. Power of two; >= 2.
    parameter int LINE_COUNT = -1,
    /// The set associativity of the cache. Power of two; >= 1.
    parameter int SET_COUNT = 1,
    /// Fetch interface address width. Same as FILL_AW; >= 1.
    parameter int FETCH_AW = -1,
    /// Fetch interface data width. Power of two; >= 8.
    parameter int FETCH_DW = -1,
    /// Fill interface address width. Same as FETCH_AW; >= 1.
    parameter int FILL_AW = -1,
    /// Fill interface data width. Power of two; >= 8.
    parameter int FILL_DW = -1,
    /// This reduces area impact at the cost of
    /// increased hassle of having latches in
    /// the design.
    /// i_snitch_icache/gen_prefetcher*i_snitch_icache_l0/data*/Q
    parameter bit EARLY_LATCH = 0,
    /// Tag width of the data determining logic, this can reduce the
    /// the critical path into the L0 cache when small. The trade-off
    /// is a higher miss-rate in case the smaller tag matches more
    /// tags. The tag must be smaller than the necessary L0 tag.
    /// If configured to `-1` the entire tag is used, effectively
    /// disabling this feature.
    parameter int L0_EARLY_TAG_WIDTH = -1,
    /// Operate L0 cache in slower clock-domain
    parameter bit ISO_CROSSING      = 1,
    /// Configuration input types for memory cuts used in implementation.
    parameter type sram_cfg_data_t  = logic,
    parameter type sram_cfg_tag_t   = logic,
    
    parameter type axi_req_t = logic,
    parameter type axi_rsp_t = logic
    ) (
    input  logic clk_i,
    input  logic clk_d2_i,
    input  logic rst_ni,
    
    input  logic                               enable_prefetching_i,
    output snitch_icache_pkg::icache_events_t [NR_FETCH_PORTS-1:0] icache_events_o,
    
    input  logic [NR_FETCH_PORTS-1:0]               flush_valid_i,
    output logic [NR_FETCH_PORTS-1:0]               flush_ready_o,
    
    input  logic [NR_FETCH_PORTS-1:0][FETCH_AW-1:0] inst_addr_i,
    output logic [NR_FETCH_PORTS-1:0][FETCH_DW-1:0] inst_data_o,
    input  logic [NR_FETCH_PORTS-1:0]               inst_cacheable_i,
    input  logic [NR_FETCH_PORTS-1:0]               inst_valid_i,
    output logic [NR_FETCH_PORTS-1:0]               inst_ready_o,
    output logic [NR_FETCH_PORTS-1:0]               inst_error_o,
    
    input  sram_cfg_data_t  sram_cfg_data_i,
    input  sram_cfg_tag_t   sram_cfg_tag_i,
    
    output axi_req_t axi_req_o,
    input  axi_rsp_t axi_rsp_i
    );
The snitch_icache from git

module snitch_icache import snitch_icache_pkg::; #( /// Number of request (fetch) ports parameter int unsigned NR_FETCH_PORTS = -1, /// L0 Cache Line Count (L0 is fully associative) parameter int unsigned L0_LINE_COUNT = -1, /// Cache Line Width parameter int unsigned LINE_WIDTH = -1, /// The number of cache lines per set. Power of two; >= 2. parameter int unsigned LINE_COUNT = -1, /// The set associativity of the cache. Power of two; >= 1. parameter int unsigned WAY_COUNT = 1, /// Fetch interface address width. Same as FILL_AW; >= 1. parameter int unsigned FETCH_AW = -1, /// Fetch interface data width. Power of two; >= 8. parameter int unsigned FETCH_DW = -1, /// Fill interface address width. Same as FETCH_AW; >= 1. parameter int unsigned FILL_AW = -1, /// Fill interface data width. Power of two; >= 8. parameter int unsigned FILL_DW = -1, /// Allow fetches to have priority over prefetches for L0 to L1 parameter bit FETCH_PRIORITY = 1'b0, /// Merge L0-L1 fetches if requesting the same address parameter bit MERGE_FETCHES = 1'b0, /// Serialize the L1 lookup (parallel tag/data lookup by default) parameter bit SERIAL_LOOKUP = 0, /// Replace the L1 tag banks with latch-based SCM. parameter bit L1_TAG_SCM = 0, /// Number of pending response beats for the L1 cache. parameter int unsigned NUM_AXI_OUTSTANDING = 2, /// This reduces area impact at the cost of /// increased hassle of having latches in /// the design. /// i_snitch_icache/gen_prefetcheri_snitch_icache_l0/data*/Q parameter bit EARLY_LATCH = 0, /// Tag width of the data determining logic, this can reduce the /// the critical path into the L0 cache when small. The trade-off /// is a higher miss-rate in case the smaller tag matches more /// tags. The tag must be smaller than the necessary L0 tag. /// If configured to -1 the entire tag is used, effectively /// disabling this feature. parameter int L0_EARLY_TAG_WIDTH = -1, /// Operate L0 cache in slower clock-domain parameter bit ISO_CROSSING = 1, /// Configuration input types for memory cuts used in implementation. parameter type sram_cfg_data_t = logic, parameter type sram_cfg_tag_t = logic,

parameter type axi_req_t = logic, parameter type axi_rsp_t = logic ) ( input logic clk_i, input logic clk_d2_i, input logic rst_ni,

input logic enable_prefetching_i, output icache_l0_events_t [NR_FETCH_PORTS-1:0] icache_l0_events_o, output icache_l1_events_t icache_l1_events_o,

input logic [NR_FETCH_PORTS-1:0] flush_valid_i, output logic [NR_FETCH_PORTS-1:0] flush_ready_o,

input logic [NR_FETCH_PORTS-1:0][FETCH_AW-1:0] inst_addr_i, output logic [NR_FETCH_PORTS-1:0][FETCH_DW-1:0] inst_data_o, input logic [NR_FETCH_PORTS-1:0] inst_cacheable_i, input logic [NR_FETCH_PORTS-1:0] inst_valid_i, output logic [NR_FETCH_PORTS-1:0] inst_ready_o, output logic [NR_FETCH_PORTS-1:0] inst_error_o,

input sram_cfg_data_t sram_cfg_data_i, input sram_cfg_tag_t sram_cfg_tag_i,

output axi_req_t axi_req_o, input axi_rsp_t axi_rsp_i );

The difference is to change the SET_COUNT to WAY_COUNT
old

/// The set associativity of the cache. Power of two; >= 1. parameter int SET_COUNT = 1,

new

/// The set associativity of the cache. Power of two; >= 1. parameter int unsigned WAY_COUNT = 1,

This change can be seem from the https://github.com/KULeuven-MICAS/cluster_icache/blob/main/Changelog.md that "Rename SET_COUNT to WAY_COUNT to correct terminology, as it reflects the number of ways in a set."

And the following new parameters

/// Allow fetches to have priority over prefetches for L0 to L1 parameter bit FETCH_PRIORITY = 1'b0, /// Merge L0-L1 fetches if requesting the same address parameter bit MERGE_FETCHES = 1'b0, /// Serialize the L1 lookup (parallel tag/data lookup by default) parameter bit SERIAL_LOOKUP = 0, /// Replace the L1 tag banks with latch-based SCM. parameter bit L1_TAG_SCM = 0, /// Number of pending response beats for the L1 cache. parameter int unsigned NUM_AXI_OUTSTANDING = 2,

These parameters can be safely used by default.

Another change is the slight difference in the snitch_icache_pkg.sv
the old one has
typedef struct packed {
  logic l0_miss;
  logic l0_hit;
  logic l0_prefetch;
  logic l0_double_hit;
  logic l0_stall;
} icache_events_t;
the new one has

typedef struct packed { logic l0_miss; logic l0_hit; logic l0_prefetch; logic l0_double_hit; logic l0_stall; } icache_l0_events_t;


There are a few things to update at the snax
1.1 Update the bender file to use the remote gitrepo
1.2 Update the the SET_COUNT to WAY_COUNT in snitch_hive.sv
1.3 Update the icache_events_t to icache_l0_events_t

2. Update the bender target for synthesis of the cluster only
I add the synthesis bender target of "tc_sram_cluster_only" to distinguish the synthesis of host + cluster or only the cluster. 
rgantonio commented 2 months ago

So after discussing with @Konste11ation , he'll fork the repo first into KU Leuven group. Add necessary changes, then should be okay!

rgantonio commented 2 months ago

@Konste11ation please ping us again before proceeding so we make sure things are still working as intended.

rgantonio commented 2 months ago

@Konste11ation test it when you pushed

https://github.com/KULeuven-MICAS/cluster_icache/pull/1

First 😄