RRZE-HPC / OSACA

Open Source Architecture Code Analyzer
GNU Affero General Public License v3.0
296 stars 18 forks source link

No throughput/latency information for vbroadcast #100

Closed IgorBaratta closed 11 months ago

IgorBaratta commented 11 months ago

I noticed that there's no throughput or latency information for the vbroadcastsd instruction for zmm in the OSACA database. This can be crucial for accurate performance analysis of an outer product kernel.

Can this be added or updated in a future version?

Combined Analysis Report
------------------------
                                                Port pressure in cycles                                                
     |  0   - 0DV  |  1   - 1DV  |  2   -  2D  |  3   -  3D  |  4   |  5   |  6   |  7   |  8   |  9   ||  CP  | LCD  |
-----------------------------------------------------------------------------------------------------------------------
   4 |             |             |             |             |      |      |      |      |      |      ||      |      | X vbroadcastsd    (%rdi,%rax), %zmm0
   5 |             |             | 0.50   0.50 | 0.50   0.50 |      |      |      |      |      |      ||  5.0 |      |   vmovupd (%rsi,%rax,8), %zmm1
   6 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd213pd     (%rdx,%rax,8), %zmm1, %zmm0 # zmm0 = (zmm1 * zmm0) + mem
   7 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     64(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
   8 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     128(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
   9 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     192(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
  10 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     256(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
  11 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     320(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
  12 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     384(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
  13 | 0.50        |             | 0.50   0.50 | 0.50   0.50 |      | 0.50 |      |      |      |      ||  4.0 |  4.0 |   vfmadd231pd     448(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0
  14 |             |             |             |             | 1.00 |      |      | 0.50 | 1.00 | 0.50 ||  0.0 |  0.0 |   vmovupd %zmm0, (%rdx,%rax,8)

------------------ WARNING: The performance data for 1 instructions is missing.------------------
                     No final analysis is given. If you want to ignore this
                     warning and run the analysis anyway, start osaca with
                                       --ignore-unknown flag.
-------------------------------------------------------------------------------------------------
JanLJL commented 11 months ago

Hi @IgorBaratta,

I added them in commit 2331e4d for Ice Lake and Skylake and they will be available in pip/godbolt with the next release

IgorBaratta commented 11 months ago

Thanks @JanLJL !