foss-for-synopsys-dwc-arc-processors / embarc_mli

Machine learning inference library for ARC EM and HS Processors
Other
25 stars 10 forks source link

embARC Machine Learning Inference Library

This repository contains source code of embARC Machine Learning Inference Library (embARC MLI Library), its documentation and examples. The primary purpose of this library is to enable developers to efficiently implement and/or port data processing algorithms based on machine learning principles for DSP-enhanced ARC Processors.

Table of Content

Release Notes

  1. Version 2.0

  2. This release supports following functional primitives

    • 2D Convolution
    • 2D Depthwise Convolution
    • 2D Transpose Convolution
    • 2D Group Convolution
    • Fully Connected layer
    • Max and average pooling
    • LSTM and GRU recurrent cells
    • RNN Dense layer
    • Elementwise (add, sub, mul, min, max)
    • Permute
    • Argmax
    • Data manipulation (concatenation, permute, 2D padding)
    • ReLU, Leaky ReLU, Parametric ReLU, ReLU1, ReLU6
    • Softmax, Sigmoid, TanH, L2 Normalization
    • Helper functions to copy (partial) tensors (mli_mov*)
  3. Supported data layout:

    • Data layout HWC (Height-Width-Channel)
  4. Supported data format:

    • Fixed point 8bit and 16bit (fx8 and fx16)
    • Signed asymmetric 8bit quantization (sa8)
    • Signed asymmetric datatype supports per-tensor or per channel quantization with 16bit scale factors.
  5. Slicing support: creation of sub-tenors and support for non-contiguous tensor data.

  6. Supported platforms:

    • VPX
    • x86 emulation
  7. Toolchains support:

Documentation

embARC MLI library API documentation for version 2.0 is available online starting from the release date.

It's sources are available in the /doc directory and can be built as described in the related readme file.

Package Structure

./bin - directory for embARC MLI library binary archive created during build
./obj - directory for all intermediate artifacts created during build
./cmake - contains CMake settings file
./make - contains common GNU make rules and settings
./doc - contains the API documentation of the embARC MLI library
./include - include files with API prototypes and types
./lib/src - source code of embARC MLI Library
./lib/gen - auxiliary generation scripts for LUT tables
./lib/make - makefiles for library and package
./user_tests - set of basic tests for library primitives
./examples - source code of examples (see Examples and Tests) section
./examples/auxiliary - source code of helper functions used for the examples
./hw - contains HW templates (*.tcf files). See related readme file

Quick Start

Quick start guide is not yet defined. If you don't want to read the whole readme, as a compromise you can proceed with the following steps, to first build the MLI library for x86 emulation:

  1. Make sure your environment satisfies the requirements from the General Build Process and x86 Host Emulation sections.
  2. Go to the Build Command Examples For x86 section, read it, and choose one of the proposed options for build.
  3. Go to the Examples And Tests section, read it, and choose one of the listed examples for the next steps on running something with embARC MLI Library.

As the next step, you can repeat this recipe for ARC processors:

  1. Make sure your environment satisfies the requirements from the General Build Process and ARC Processors sections.
  2. Go to the Build Command Examples For ARC Processors section, read it, and choose one of the proposed options for build.
  3. Go to the Examples And Tests section, read it, and choose one of the listed examples for the next steps on running something with embARC MLI Library.

Afterward you can continue with familiarizing yourself with the documentation, which contains all the necessary info and references.

Note that it is highly recommended to use DBG_MODE_DEBUG configuration option (see MLI_DEBUG_MODE) for early development of applications based on embARC MLI Library because it provides additional diagnostic output which can help you quickly track down misuse of the API.

Building The Package

The embARC MLI Library uses CMake as a backend for the platform independent project generation and GNU Make as a front end to invoke CMake and to run tests. Alternatively, after CMake configures the project for the desired platform, you can work with its output stored in obj folder as you may be used to.

General Build Process

Basic build requirements are the following:

A compatible version of gmake is also delivered with the MetaWare Development Tools 2020.12 and higher. All command examples in the repo readmes will use gmake, but you can replace it with your suitable and compatible one.

The embARC MLI front end Make infrastructure provides targets for easy configuration of the project for a desired platform and toolchain. It also can build from sources, run tests and examples.

General template of the build command looks like:

gmake <target> <options> 

Available <targets>:

Note that tests and example applications have separate makefiles with additional targets which aren't explained here. See Examples and Tests section for info and references

<options> are described in the Build Configuration Options section below. Here is a list of links for available options:

Note that tests and example applications are part of the same generated CMake build system as the library itself which behavior depends on build configuration options). Tests and examples also may be adjusted using their own configuration options not listed above. See Examples and Tests section for info and references.

The embARC MLI Library can be built for the following platforms:

Build for these platforms creates separate projects in obj directory and separate binaries in bin directory. Build process for supported platforms is defined below.

x86 Host Emulation

The embARC MLI Library can be built for host platform and used in compatible applications to ease early development or verification. x86 and x64 architectures are supported, but for simplicity only x86 will be mentioned within documentation. No optimization is applied for this platform. Depending on the MLI build configuration, calculation results on x86 platform can be bit exact with desired ARC processor within defined behavior of MLI Functions.

The x86 Host emulation of the library has been tested with the following toolchains:

To build embARC MLI library you need

  1. Open command line and change working directory to the root of the repo
  2. Start building using the following command template which you need to adjust for your needs.
gmake build ROUND_MODE=[UP|CONVERGENT] <Additional options>

ROUND_MODE is a mandatory option for x86 host emulation target. TCF_FILE option must not be used (only empty value is allowed).

As a result of configuration and build you will find bin/native folder with the library binary file and obj/native directory with generated project for the default toolchain and IDE within the environment.

<Additional options> which are applicable for this mode are JOBS, VERBOSE, FULL_ACCU, MLI_DEBUG_MODE, RECONFIGURE, GEN_EXAMPLES.

<Additional options> which have no effect or do not make sense in this mode are BUILDLIB_DIR, MLI_BUILD_REFERENCE, OPTMODE, DEBUG_BUILD.

Build Command Examples for x86

The first step is to open a command line and change working directory to the root of the embARC MLI repo. Afterward, you can use one of the following commands.

  1. Build project to emulate ARC VPX platform:

    gmake build ROUND_MODE=UP FULL_ACCU=OFF 
  2. Build project to emulate ARC VPX platform with full debug checking of parameters and assertions in runtime. Use multithreaded build process (4 threads):

    gmake build ROUND_MODE=UP FULL_ACCU=OFF JOBS=4 MLI_DEBUG_MODE=DBG_MODE_FULL 

ARC Processors

Main target platforms for embARC MLI Library are ARC processors. The specific processor family is determined by *.tcf file provided for library configuration. It is highly recommended to use embARC MLI 2.0 for VPX processor only. EM/HS targets are not properly tested and optimized. You can use embARC MLI 1.1 instead.

embARC MLI Library build for ARC processors requires MetaWare Development Tools (MWDT) version 2021.03 and higher.

To build embARC MLI library you need

  1. Open command line and change working directory to the root of the repo
  2. Start building using the following command template which you need to adjust for your needs.
gmake build TCF_FILE=<path_to_tcf> [BUILDLIB_DIR=<path_to_target_rt_libs>] <Additional options>

TCF_FILE is a mandatory option for ARC target.

In case you are going to compile and run tests or examples it is better to provide the path to a runtime library using the BUILDLIB_DIR option.

As a result of configuration and build you will find bin/arc folder with the MLI library and obj/arc directory with generated Makefile project configured to use MWDT toolchain. If you use MLI_BUILD_REFERENCE option, then artifacts will be created in bin/arc_ref and obj/arc_ref directories correspondingly.

<Additional options> which are applicable for this mode are JOBS, VERBOSE, MLI_BUILD_REFERENCE, MLI_DEBUG_MODE, DEBUG_BUILD, RECONFIGURE, GEN_EXAMPLES, OPTMODE.

<Additional options> which have no or limited effect in this mode are FULL_ACCU, ROUND_MODE. ROUND_MODE option is applicable only for ARC EMxD family.

Build Command Examples for ARC Processors

The following commands assume usage of the recommended VPX configuration. TCF for this configuration you need to generate using tcfgen tool delivered with MetaWare Development tools, in order to ensure sufficient target memory to run all of the examples. The first step is to open a command line and change working directory to the root of the embARC MLI repo. Then use the following command to generate recommended tcf file taking default vpx5_integar_full configuration as basis:

tcfgen -o ./hw/vpx5_integer_full.tcf -tcf=vpx5_integer_full -iccm_size=0x80000 -dccm_size=0x40000

Afterward, you can use one of the following commands to configure and build the package:

  1. Build project for recommended ARC VPX evaluation target. BUILDLIB_DIR is mandatory for this, but default "vpx5_integer_full" pack delivered with MWDT tools can be used. Use multithreaded build process (4 threads):

    gmake TCF_FILE=./hw/vpx5_integer_full.tcf BUILDLIB_DIR=vpx5_integer_full JOBS=4 build 
  2. Build project for recommended ARC VPX evaluation target optimized for code size and with full debug checking of parameters and assertions in runtime. Use multithreaded build process (4 jobs):

    gmake build TCF_FILE=./hw/vpx5_integer_full.tcf BUILDLIB_DIR=vpx5_integer_full \
    OPTMODE=size MLI_DEBUG_MODE=DBG_MODE_FULL JOBS=4
  3. Build project for recommended ARC VPX evaluation target using reference code. It's unoptimized straightforward and expected to be bitwise with optimized one. Use multithreaded build process (4 jobs) and artifacts are stored in bin/arc_ref and obj/arc_ref:

    gmake build TCF_FILE=./hw/vpx5_integer_full.tcf BUILDLIB_DIR=vpx5_integer_full \
    MLI_BUILD_REFERENCE=ON JOBS=4

Build Configuration Options

ROUND_MODE

Description: Rounding mode for low level math on casting and shifting values.

Syntax: ROUND_MODE=[UP|CONVERGENT]
Values:

Default:

FULL_ACCU

Description: Usage of full or reduced accumulator bit depth during accumulation. This option is provided to emulate VPX specific low level optimization and can be used only for x86 platform, or together with MLI_BUILD_REFERENCE=ON option.

Syntax: FULL_ACCU=[ON|OFF]
Values:

Default: ON

TCF_FILE

Description: Tool configuration file (TCF) file path.

The TCF file defines ARC target processor hardware configuration. You can supply your own TCF that aligns with your hardware configuration. If you don't have a specific tcf file, you can use the tcfgen util delivered with MetaWare Development tools. Basic _vpx5_integerfull template delivered with MetaWare Development tools might be a good starting point. The tcfgen tool is documented in Linker and Utils guide which is also delivered with MetaWare Development tools. This option is mandatory for ARC platform and must not be set for x86 host emulation.

Syntax: TCF_FILE=<tcf-file>
Values: one of two options:

Default: No default value.

BUILDLIB_DIR

Description: Path to runtime libraries for the ARC platform to link applications with.

Runtime Libraries are required for tests and example applications delivered with MLI Library package, but not needed for the library build. While for some targets not setting this option is acceptable (EMxD), it's highly recommended to build libraries specifically for your target. It can be done using the buildlib util delivered with MetaWare Development tools. The buildlib tool is documented in Linker and Utils guide which is also delivered with MetaWare Development tools. Alternatively, you can also pass the name of the runtime library delivered with MetaWare Development tools if it is compatible with your hardware configuration. For instance, together with TCF_FILE=hw\vpx5_integer_full.tcf you can use BUILDLIB_DIR=vpx5_integer_full to use compatible pre-built libraries within the Metaware distribution.

This option has no effect on x86 host emulation build.

Syntax: BUILDLIB_DIR=<target_rt_libs>
Values: one of two options:

Default: No default value. The MetaWare compiler choses a default library for the ARC platform, which could result in incompatibilities with the TCF-file you specified. In case you are going to compile and run tests or examples for the ARC platform, it's better to provide a runtime library path.

MLI_BUILD_REFERENCE

Description: Switch embARC MLI Library implementation between platform independent reference code and platform default code.

Reference code is a configurable straightforward unoptimized implementation. It's goal is to emulate desired ARC processor on the bit exact level within defined behavior of MLI Functions. If this switch is turned on, artifacts will be generated into directory with _ref postfix (bin/arc_ref and obj/arc_ref for instance)

Syntax: MLI_BUILD_REFERENCE=[ON|OFF]
Values:

Default: OFF

JOBS

Description: Number of jobs (threads) used on workstation to build the MLI package. Increasing number of jobs can reduce build time.
Syntax: JOBS=<number of jobs>
Values: It is recommended to use value within [1; number of host logical cores] range.
Default: no default value (single thread)

VERBOSE

Description: Activates verbose output from CMake and build tools during build of the project.

Syntax: VERBOSE=[ON|OFF]
Values:

Default: OFF

OPTMODE

Description: Define optimization mode of embARC MLI Library and all delivered examples and tests. This option has no effect on x86 host emulation build.

Syntax: OPTMODE=[speed|size]
Values:

Default: speed

MLI_DEBUG_MODE

Description: Additional debug features mode.

To ease application debugging, additional debug features can be turned-on during build which includes:

Syntax: MLI_DEBUG_MODE=DBG_MODE_[RELEASE|RET_CODES|ASSERT|DEBUG|FULL]
Values:

Default: DBG_MODE_RELEASE

DEBUG_BUILD

Description: Include debug information into binaries during build (-g flag). This option has no effect on x86 host emulation build.

Syntax: DEBUG_BUILD=[ON|OFF]
Values:

Default: ON

RECONFIGURE

Description: Always executes the CMake configure step, even if a project has already been configured. It may cause unwanted rebuilds.

Syntax: RECONFIGURE=[ON|OFF]
Values:

Default: OFF

GEN_EXAMPLES

Description: Include MLI Examples (./examples) into the generation and build process together with the library and tests.

Syntax: GEN_EXAMPLES=[1|0]
Values:

Default: 1

Examples And Tests

There are test and several examples supplied with embARC MLI Library. For information on how to build and run each example please go to the related directory and examine local README.

User Tests

These are basic API level test applications to check that all the functions available at the API level work fine.

Hello World

This example is a first step into API functions and data usage.

CIFAR-10

This example is a simple image classifier built on convolution, pooling and dense layers. It is based on standard Caffe tutorial for CIFAR-10 dataset.

Human Activity Recognition

LSTM Based Human Activity Recognition example. The model is intended to differentiate human activity between 6 classes based on inputs from embedded inertial sensors from waist-mounted smartphone.

Face Detection

More advanced but still compact face detection example. It shows how the slicing and data movement can be organised to efficiently use limited fast CCM memory.

EMNIST TFLM Tutorial

This example shows how to convert EMNIST Tensorflow model into Tensorflow Lite Micro format and use it in application.

Known Issues

  1. embARC MLI 2.0 is partially optimized for ARC EMxD and ARC HSxD targets. Currently we recommend only building for VPX and x86 emulation targets. You can use MLI 1.1 for EM/HS targets.

Frequently Asked Questions

Q: I can not build and run example application for my Synopsys board (EMSK, IoTDK, etc), what I shall do?
A: It isn't supported at the moment. Currently we recommend only building for VPX and x86 emulation targets. You can use MLI 1.1 for EM/HS targets.

Q: Can I use ARC GNU tools to build embARC MLI library?
A: embARC MLI Library must be built by MetaWare Development Tools only.

Q: Can I use MetaWare Development Tools Lite to pre-build embARC MLI library and ARC GNU to build example application?
A: embARC MLI Library must be built by full version of MetaWare Development Tools. Binaries built with MWDT Lite are not compatible with ARC GNU Tools and full MetaWare Development Tools. Read the MWDT Lite documentation for details.