ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.17k stars 310 forks source link

ExecuteNetwork profiling #537

Closed prabhakarlad closed 3 years ago

prabhakarlad commented 3 years ago

Hello all,

I have been trying to run profiling with ExecuteNetwork (armnn version 21.02), when I run with GPU backend I do see the timer values for example:

            "NeonConvolution2dWorkload_Execute_#8": {
                "type": "Event",
                "NeonKernelTimer/0: NEIm2ColKernel_#8": {
                    "type": "Measurement",
                    "raw": [
                        1458.791000
                    ],
                    "unit": "us"
                },
                "NeonKernelTimer/1: NEGEMMAssemblyWrapperKernel/a64_smallK_hybrid_u8u32_dot_8x4_#8": {
                    "type": "Measurement",
                    "raw": [
                        7416.292000
                    ],
                    "unit": "us"
                },
                "Wall clock time_#8": {
                    "type": "Measurement",
                    "raw": [
                        9041.875000
                    ],
                    "unit": "us"
                },
                "Wall clock time (Start)_#8": {
                    "type": "Measurement",
                    "raw": [
                        134619045.564000
                    ],
                    "unit": "us"
                },
                "Wall clock time (Stop)_#8": {
                    "type": "Measurement",
                    "raw": [
                        134628087.439000
                    ],
                    "unit": "us"
                }
            },

For the GPU backend I always get the values as 00.0

            "ClConvolution2dWorkload_Execute_#11": {
                "type": "Event",
                "OpenClKernelTimer/0: gemmlowp_matrix_a_reduction GWS[112,112,1]_#11": {
                    "type": "Measurement",
                    "raw": [
                        0.000000
                    ],
                    "unit": "us"
                },
                "OpenClKernelTimer/1: gemmlowp_mm_reshaped_only_rhs_t_fused_output_stage_fixedpoint GWS[32,32,112]_#11": {
                    "type": "Measurement",
                    "raw": [
                        0.000000
                    ],
                    "unit": "us"
                },
                "Wall clock time_#11": {
                    "type": "Measurement",
                    "raw": [
                        411.333000
                    ],
                    "unit": "us"
                },
                "Wall clock time (Start)_#11": {
                    "type": "Measurement",
                    "raw": [
                        300657261.851000
                    ],
                    "unit": "us"
                },
                "Wall clock time (Stop)_#11": {
                    "type": "Measurement",
                    "raw": [
                        300657673.184000
                    ],
                    "unit": "us"
                }
            },

Is there additional flags required while building the DDK/armnn for the GPU backend ?

Cheers, Prabhakar

matthewsloyanARM commented 3 years ago

Hi @prabhakarlad,

Thanks for reaching out. The GPU profiling wasn't being enabled correctly in Arm NN version 21.02. A patch can be found here to fix this which is on the master branch, would you be able try this out and see if this fixes your issue? Thank you!

Kind regards,

Matthew

prabhakarlad commented 3 years ago

Thank you @matthewsloyanARM, I am now able get the timings. (patch doesn't apply cleanly on 21.02 release so I had to manually apply the changes)