Is it possible to programmatically obtain an Ascend device's peak floating point performance?
Depending on the voltage/config, the peak floating point performance changes. To have a solution that works across ascends, I would prefer not to use hardcoded values.
If there is a config or a header file where these values are stored, I can parse them. (I remember one header file (I need to look into my notes to remember the name of it) that has cache sizes, etc., saved)
Since I plan to use a machine model to guide the hyperparameter search for tiling transformations, I need to know the L1 (A1, B1), L2 (A2, B2), and CO1, CO2 (They also map to L1 and L2 levels in the memory hierarchy) bandwidths as well as the global memory bandwidth and peak floating point.
Is it possible to programmatically obtain an Ascend device's peak floating point performance?
Depending on the voltage/config, the peak floating point performance changes. To have a solution that works across ascends, I would prefer not to use hardcoded values.
If there is a config or a header file where these values are stored, I can parse them. (I remember one header file (I need to look into my notes to remember the name of it) that has cache sizes, etc., saved)
Since I plan to use a machine model to guide the hyperparameter search for tiling transformations, I need to know the L1 (A1, B1), L2 (A2, B2), and CO1, CO2 (They also map to L1 and L2 levels in the memory hierarchy) bandwidths as well as the global memory bandwidth and peak floating point.
For example, for global memory bandwidth:
Could it be calculated as
1200 * 1e3 * atomic_size_of_transfer * number_of_ai_cores
?Are there any resources you can share me for this?
This is fundamental not to need to brute-force tiling parameters. And, of course, assess the quality of tiling transformations applied to Ascend.