KULeuven-MICAS / snax_cluster

A heterogeneous accelerator-centric compute cluster
Apache License 2.0
10 stars 9 forks source link

Add custom TCDM assignment feature #249

Closed rgantonio closed 2 months ago

rgantonio commented 2 months ago

This PR adds an advanced feature where you manually assign the narrow and wide TCDM assignments. We discovered that we needed this because of @xiaoling-yi's GeMM-CONV core and the Hypercorex.

The old assignment from the narrow and wide TCDM always assumes that all wide ports should be collated together (grouped together) from the LSB TCDM ports to whatever size they need. Then all narrow TCDM ports come after. Moreover, all reads come first then all writes come after too causing a limitation in TCDM assignment. Visually...

[ all narrow ports, all wide ports]

However the the streamers produce an arrangement such that:

[narrow read ports, wide read ports, narrow write ports, wide write ports]

Our previous accelerators complied with these and there were no problems. For example, in the case of Gemm + Data reshuffle, our GeMM has 2 wide read ports, and 4 wide write ports, and the data reshuffle can use 8 narrow read ports and 1 wide write port which we can re-arrange as:

[8 narrow read ports of DR, 1 wide write port of DR, 2 wide read ports of GeMM, 4 wide write ports of GeMM]

However, having more complex accelerators necessitates a flexible port assignment. This becomes a limitation when the arrangement of narrow and wide ports can be mixed. For example, in Hypercorex we need the arrangement (in order):

[2 narrow read, 1 narrow write, 3 wide reads, 1 wide write]

All of which cannot be mapped accordingly with the current setup.

This PR fixes it.

How to configure then?

A sample of the configuration file is shown below:

// SNAX custom cluster TCDM assignment
snax_custom_tcdm_assign: {
    snax_enable_assign_tcdm_idx: true,
    snax_narrow_assign_start_idx: [0,26],
    snax_narrow_assign_end_idx: [1,26],
    snax_wide_assign_start_idx: [2,27],
    snax_wide_assign_end_idx: [25,34],
},

NOTE: You need to declare this outside of the accelerator file. This custom TCDM re-assignment is a cluster-level configuration. So any user who uses this needs to know the specific mapping to TCDMs.

Major TODO:

rgantonio commented 2 months ago

@xiaoling-yi this should fix everything! But the queue is long since we pushed many PRs in hehe...

rgantonio commented 2 months ago

Also, we define this to be an advanced feature. Since it's a wee bit too complicated to describe. Also the fix is dirty, and maybe a refactoring is needed later. At least there's minimal changes only.