Closed IgorBaratta closed 11 months ago
I noticed that there's no throughput or latency information for the vbroadcastsd instruction for zmm in the OSACA database. This can be crucial for accurate performance analysis of an outer product kernel.
vbroadcastsd
zmm
Can this be added or updated in a future version?
Combined Analysis Report ------------------------ Port pressure in cycles | 0 - 0DV | 1 - 1DV | 2 - 2D | 3 - 3D | 4 | 5 | 6 | 7 | 8 | 9 || CP | LCD | ----------------------------------------------------------------------------------------------------------------------- 4 | | | | | | | | | | || | | X vbroadcastsd (%rdi,%rax), %zmm0 5 | | | 0.50 0.50 | 0.50 0.50 | | | | | | || 5.0 | | vmovupd (%rsi,%rax,8), %zmm1 6 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd213pd (%rdx,%rax,8), %zmm1, %zmm0 # zmm0 = (zmm1 * zmm0) + mem 7 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 64(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 8 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 128(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 9 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 192(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 10 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 256(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 11 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 320(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 12 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 384(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 13 | 0.50 | | 0.50 0.50 | 0.50 0.50 | | 0.50 | | | | || 4.0 | 4.0 | vfmadd231pd 448(%rdi,%rax){1to8}, %zmm1, %zmm0 # zmm0 = (zmm1 * mem) + zmm0 14 | | | | | 1.00 | | | 0.50 | 1.00 | 0.50 || 0.0 | 0.0 | vmovupd %zmm0, (%rdx,%rax,8) ------------------ WARNING: The performance data for 1 instructions is missing.------------------ No final analysis is given. If you want to ignore this warning and run the analysis anyway, start osaca with --ignore-unknown flag. -------------------------------------------------------------------------------------------------
Hi @IgorBaratta,
I added them in commit 2331e4d for Ice Lake and Skylake and they will be available in pip/godbolt with the next release
Thanks @JanLJL !
I noticed that there's no throughput or latency information for the
vbroadcastsd
instruction forzmm
in the OSACA database. This can be crucial for accurate performance analysis of an outer product kernel.Can this be added or updated in a future version?