ARM-software / ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
MIT License
2.75k stars 767 forks source link

Question: Extract per layer information from finalized graph. #1041

Closed AndreasKaratzas closed 1 year ago

AndreasKaratzas commented 1 year ago

Output of 'strings libarm_compute.so | grep arm_compute_version': arm_compute_version=v21.02 Build options: {'Werror': '0', 'debug': '0', 'asserts': '0', 'neon': '1', 'opencl': '1', 'openmp': '1', 'os': 'linux', 'arch': 'arm64-v8a', 'build': 'native', 'embed_kernels': '1'} Git hash=b'0cacff24861bf3201e35426424752b78d3608a39'

Platform: OrangePi 5 16GB

Operating System: Ubuntu 22.04.2 LTS aarch64

Problem description: Hello :) This is more of a question than an issue. I was wondering, is there any way that I can parse a graph after it has been finalized to extract information about the layers? For example, let the class Alexnet in graph_alexnet.cpp under examples. Is there any way to get a structure like the one below?

{
    "1": {
        "InputLayer": {
            "input_width": 227, 
            "input_height": 227,
            "input_channels": 3,
            "input_batch_size": 1,
            "input_dtype": "float32",
            "input_layout": "NCHW"
        }
    },
    "2": {
        "ConvolutionLayer": {
            "conv_width": 11,
            "conv_height": 11,
            "ofm": 96,
            "conv_info": [ 4, 4, 0, 0 ],
            "num_groups": 1
        }
    },
    "3": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "4": {
        "NormalizationLayer": {
            "type": "CROSS_MAP",
            "norm_size": 5,
            "alpha": 0.0001,
            "beta": 0.75,
            "kappa": 1.0,
            "is_scaled": true
        }
    },
    "5": {
        "PoolingLayer": {
            "pool_type": "MAX",
            "pool_size": 3,
            "pad_stride_info": [ 2, 2, 0, 0 ]
        }
    },
    "6": {
        "ConvolutionLayer": {
            "conv_width": 5,
            "conv_height": 5,
            "ofm": 256,
            "conv_info": [ 1, 1, 2, 2 ],
            "num_groups": 2
        }
    },
    "7": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "8": {
        "NormalizationLayer": {
            "type": "CROSS_MAP",
            "norm_size": 5,
            "alpha": 0.0001,
            "beta": 0.75,
            "kappa": 1.0,
            "is_scaled": true
        }
    },
    "9": {
        "PoolingLayer": {
            "pool_type": "MAX",
            "pool_size": 3,
            "pad_stride_info": [ 2, 2, 0, 0 ]
        }
    },
    "10": {
        "ConvolutionLayer": {
            "conv_width": 3,
            "conv_height": 3,
            "ofm": 384,
            "conv_info": [ 1, 1, 1, 1 ],
            "num_groups": 1
        }
    },
    "11": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "12": {
        "ConvolutionLayer": {
            "conv_width": 3,
            "conv_height": 3,
            "ofm": 384,
            "conv_info": [ 1, 1, 1, 1 ],
            "num_groups": 2
        }
    },
    "13": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "14": {
        "ConvolutionLayer": {
            "conv_width": 3,
            "conv_height": 3,
            "ofm": 256,
            "conv_info": [ 1, 1, 1, 1 ],
            "num_groups": 2
        }
    },
    "15": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "16": {
        "PoolingLayer": {
            "pool_type": "MAX",
            "pool_size": 3,
            "pad_stride_info": [ 2, 2, 0, 0 ]
        }
    },
    "17": {
        "FullyConnectedLayer": {
            "num_outputs": 4096
        }
    },
    "18": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "19": {
        "FullyConnectedLayer": {
            "num_outputs": 4096
        }
    },
    "20": {
        "ActivationLayer": {
            "activation_type": "RELU"
        }
    },
    "21": {
        "FullyConnectedLayer": {
            "num_outputs": 1000
        }
    },
    "22": {
        "SoftmaxLayer": {
            "beta": 1.0
        }
    }
}
morgolock commented 1 year ago

Hi @AndreasKaratzas

It's not possible without making changes to source code in the graph api.

Functions and kernels will print their configuration at runtime if you enable logging when compiling the library, see for example https://github.com/ARM-software/ComputeLibrary/blob/main/src/cpu/operators/CpuGemmConv2d.cpp#L271

You just need to build the library with logging=1 to enable this, then when you run graph_alexnet you will need to filter out all the kernels and just look at the operators as shown below:

[ComputeLibrary][27-03-2023 02:22:03][INFO]  arm_compute::NEConvolutionLayer::configure() : 
 input: ITensor->info(): Shape=3,227,227,DataLayout=NHWC,DataType=F32
 weights: ITensor->info(): Shape=3,11,11,96,DataLayout=NHWC,DataType=F32
 biases: ITensor->info(): Shape=96,DataLayout=NHWC,DataType=F32
 output: ITensor->info(): Shape=96,55,55,DataLayout=NHWC,DataType=F32
 conv_info: 4,4;0,0,0,0
 weights_info: 0;0;0,0
 dilation: 1x1
 act_info: RELU
 enable_fast_math: false
 num_groups: 1

If you do not need so much detail then you can just use | grep GRAPH to get a list of high level operators used in the graph as shown below:

root@acl_hikey_9:~/tmp/user/acl_mt# LD_LIBRARY_PATH=./main_release+logging:$LD_LIBRARY_PATH ./graph_alexnet  | grep GRAPH
 [GRAPH][27-03-2023 02:02:05][INFO]  Running mutating pass : NodeFusionMutator
 [GRAPH][27-03-2023 02:02:05][INFO]  Running mutating pass : GroupedConvolutionMutator
 [GRAPH][27-03-2023 02:02:05][INFO]  Running mutating pass : InPlaceOperationMutator
 [GRAPH][27-03-2023 02:02:05][INFO]  Running mutating pass : DepthConcatSubTensorMutator
 [GRAPH][27-03-2023 02:02:05][INFO]  Running mutating pass : SplitLayerSubTensorMutator
 [GRAPH][27-03-2023 02:02:05][INFO]  Running mutating pass : NodeExecutionMethodMutator
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv1 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 3,227,227 Weights shape: 3,11,11,96 Output shape: 96,55,55 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated norm1 Type: NormalizationLayer Target: Neon Data Type: F32 Input shape: 96,55,55 Output shape: 96,55,55 Normalization info: CROSS_MAP
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated pool1 Type: PoolingLayer Target: Neon Data Type: F32 Input shape: 96,55,55 Output shape: 96,27,27 Pooling info: MAX
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv2_g0 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 48,27,27 Weights shape: 48,5,5,128 Output shape: 128,27,27 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv2_g1 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 48,27,27 Weights shape: 48,5,5,128 Output shape: 128,27,27 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv2 Type: ConcatenateLayer Target: Neon Data Type: F32 Shape: 256,27,27 Num Inputs: 2 Axis: 0
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated norm2 Type: NormalizationLayer Target: Neon Data Type: F32 Input shape: 256,27,27 Output shape: 256,27,27 Normalization info: CROSS_MAP
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated pool2 Type: PoolingLayer Target: Neon Data Type: F32 Input shape: 256,27,27 Output shape: 256,13,13 Pooling info: MAX
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv3 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 256,13,13 Weights shape: 256,3,3,384 Output shape: 384,13,13 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv4_g0 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 192,13,13 Weights shape: 192,3,3,192 Output shape: 192,13,13 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv4_g1 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 192,13,13 Weights shape: 192,3,3,192 Output shape: 192,13,13 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv4 Type: ConcatenateLayer Target: Neon Data Type: F32 Shape: 384,13,13 Num Inputs: 2 Axis: 0
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv5_g0 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 192,13,13 Weights shape: 192,3,3,128 Output shape: 128,13,13 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv5_g1 Type: GenericConvolutionLayer Target: Neon Data Type: F32 Groups: 1 Input shape: 192,13,13 Weights shape: 192,3,3,128 Output shape: 128,13,13 RELU
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated conv5 Type: ConcatenateLayer Target: Neon Data Type: F32 Shape: 256,13,13 Num Inputs: 2 Axis: 0
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated pool5 Type: PoolingLayer Target: Neon Data Type: F32 Input shape: 256,13,13 Output shape: 256,6,6 Pooling info: MAX
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated fc6 Type: FullyConnectedLayer Target: Neon Data Type: F32 Input shape: 256,6,6 Weights shape: 9216,4096 Output shape: 4096
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated fc7 Type: FullyConnectedLayer Target: Neon Data Type: F32 Input shape: 4096 Weights shape: 4096,4096 Output shape: 4096
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated fc8 Type: FullyConnectedLayer Target: Neon Data Type: F32 Input shape: 4096 Weights shape: 4096,1000 Output shape: 1000
 [GRAPH][27-03-2023 02:02:05][INFO]  Instantiated prob Type: SoftmaxLayer Target: Neon Data Type: F32 Input shape: 1000 Output shape: 1000

Hope this helps.