Peak FLOPs/s and Memory Bandwidth of Ascend 910A

Is it possible to programmatically obtain an Ascend device's peak floating point performance?

Depending on the voltage/config, the peak floating point performance changes. To have a solution that works across ascends, I would prefer not to use hardcoded values.

If there is a config or a header file where these values are stored, I can parse them. (I remember one header file (I need to look into my notes to remember the name of it) that has cache sizes, etc., saved)

Since I plan to use a machine model to guide the hyperparameter search for tiling transformations, I need to know the L1 (A1, B1), L2 (A2, B2), and CO1, CO2 (They also map to L1 and L2 levels in the memory hierarchy) bandwidths as well as the global memory bandwidth and peak floating point.

For example, for global memory bandwidth:

npu-smi info -i 0 -t memory
        NPU ID                         : 0
        Chip Count                     : 1

        DDR Capacity(MB)               : 13553
        DDR Clock Speed(MHz)           : 1200
        HBM Capacity(MB)               : 32768
        HBM Clock Speed(MHz)           : 1200
        HBM Temperature(C)             : 48
        Chip ID                        : 0

Could it be calculated as 1200 * 1e3 * atomic_size_of_transfer * number_of_ai_cores ?

Are there any resources you can share me for this?

This is fundamental not to need to brute-force tiling parameters. And, of course, assess the quality of tiling transformations applied to Ascend.

ThrudPrimrose / AcceleratorCodeGen

Peak FLOPs/s and Memory Bandwidth of Ascend 910A #2